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To my family 


Preface 


Numerous papers on system identification have been published over the last 40 years. 
Though there were substantial developments in the theory of stationary stochastic 
processes and multivariable statistical methods during 1950s, it is widely recognized 
that the theory of system identification started only in the mid-1960s with the pub- 
lication of two important papers; one due to Astrém and Bohlin [17], in which the 
maximum likelihood (ML) method was extended to a serially correlated time series 
to estimate ARMAX models, and the other due to Ho and Kalman [72], in which 
the deterministic state space realization problem was solved for the first time using a 
certain Hankel matrix formed in terms of impulse responses. These two papers have 
laid the foundation for the future developments of system identification theory and 
techniques [55]. 

The scope of the ML identification method of Astr6m and Bohlin [17] was 
to build single-input, single-output (SISO) ARMAX models from observed input- 
output data sequences. Since the appearance of their paper, many statistical identifi- 
cation techniques have been developed in the literature, most of which are now com- 
prised under the label of prediction error methods (PEM) or instrumental variable 
(IV) methods. This has culminated in the publication of the volumes Ljung [109] and 
Soderstrom and Stoica [145]. At this moment we can say that theory of system iden- 
tification for SISO systems is established, and the various identification algorithms 
have been well tested, and are now available as MATLAB® programs. 

Also, identification of multi-input, multi-output (MIMO) systems is an important 
problem which is not dealt with satisfactorily by PEM methods. The identification 
problem based on the minimization of a prediction error criterion (or a least-squares 
type criterion), which in general is a complicated function of the system parameters, 
has to be solved by iterative descent methods which may get stuck into local min- 
ima. Moreover, optimization methods need canonical parametrizations and it may 
be difficult to guess a suitable canonical parametrization from the outset. Since no 
single continuous parametrization covers all possible multivariable linear systems 
with a fixed McMillan degree, it may be necessary to change parametrization in the 
course of the optimization routine. Thus the use of optimization criteria and canon- 
ical parametrizations can lead to local minima far from the true solution, and to 
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numerically ill-conditioned problems due to poor identifiability, i.e., to near insensi- 
tivity of the criterion to the variations of some parameters. Hence it seems that the 
PEM method has inherent difficulties for MIMO systems. 

On the other hand, stochastic realization theory, initiated by Faurre [46] and 
Akaike [1] and others, has brought in a different philosophy of building models from 
data, which is not based on optimization concepts. A key step in stochastic realiza- 
tion is either to apply the deterministic realization theory to a certain Hankel matrix 
constructed with sample estimates of the process covariances, or to apply the canon- 
ical correlation analysis (CCA) to the future and past of the observed process. These 
algorithms have been shown to be implemented very efficiently and in a numerically 
stable way by using the tools of modern numerical linear algebra such as the singular 
value decomposition (SVD). 

Then, a new effort in digital signal processing and system identification based on 
the QR decomposition and the SVD emerged in the mid- 1980s and many papers have 
been published in the literature [100, 101, 118, 119], etc. These realization theory- 
based techniques have led to a development of various so-called subspace identifica- 
tion methods, including [163, 164, 169, 171-173], etc. Moreover, Van Overschee and 
De Moor [165] have published a first comprehensive book on subspace identification 
of linear systems. An advantage of subspace methods is that we do not need (non- 
linear) optimization techniques, nor we need to impose to the system a canonical 
form, so that subspace methods do not suffer from the inconveniences encountered 
in applying PEM methods to MIMO system identification. 

Though I have been interested in stochastic realization theory for many years, 
it was around 1990 that I actually resumed studies on realization theory, including 
subspace identification methods. However, realization results developed for deter- 
ministic systems on the one hand, and stochastic systems on the other, could not be 
applied to the identification of dynamic systems in which both a deterministic test 
input and a stochastic disturbance are involved. In fact, the deterministic realization 
result does not consider any noise, and the stochastic realization theory developed up 
to the early 1990s did address modeling of stochastic processes, or time series, only. 
Then, I noticed at once that we needed a new realization theory to understand many 
existing subspace methods and their underlying relations and to develop advanced 
algorithms. Thus I was fully convinced that a new stochastic realization theory in 
the presence of exogenous inputs was needed for further developments of subspace 
system identification theory and algorithms. 

While we were attending the MTNS (The International Symposium on Math- 
ematical Theory of Networks and Systems) at Regensburg in 1993, I suggested to 
Giorgio Picci, University of Padova, that we should do joint work on stochastic re- 
alization theory in the presence of exogenous inputs and a collaboration between us 
started in 1994 when he stayed at Kyoto University as a visiting professor. Also, I 
successively visited him at the University of Padova in 1997. The collaboration has 
resulted in several joint papers [87—90, 93, 130, 131]. Professor Picci has in partic- 
ular introduced the idea of decomposing the output process into deterministic and 
stochastic components by using a preliminary orthogonal decomposition, and then 
applying the existing deterministic and stochastic realization techniques to each com- 
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ponent to get a realization theory in the presence of exogenous input. On the other 
hand, inspired by the CCA-based approach, I have developed a method of solving a 
multi-stage Wiener prediction problem to derive an innovation representation of the 
stationary process with an observable exogenous input, from which subspace identi- 
fication methods are successfully obtained. 

This book is an outgrowth of the joint work with Professor Picci on stochastic 
realization theory and subspace identification. It provides an in-depth introduction to 
subspace methods for system identification of discrete-time linear systems, together 
with our results on realization theory in the presence of exogenous inputs and sub- 
space system identification methods. I have included proofs of theorems and lemmas 
as much as possible, as well as solutions to problems, in order to facilitate the basic 
understanding of the material by the readers and to minimize the effort needed to 
consult many references. 

This textbook is divided into three parts: Part I includes reviews of basic results, 
from numerical linear algebra to Kalman filtering, to be used throughout this book, 
Part II provides deterministic and stochastic realization theories developed by Ho 
and Kalman, Faurre, and Akaike, and Part III discusses stochastic realization results 
in the presence of exogenous inputs and their adaptation to subspace identification 
methods; see Section 1.6 for more details. Thus, various people can read this book ac- 
cording to their needs. For example, people with a good knowledge of linear system 
theory and Kalman filtering can begin with Part I. Also, people mainly interested 
in applications can just read the algorithms of the various identification methods in 
Part II, occasionally returning to Part I and/or Part II when needed. I believe that 
this textbook should be suitable for advanced students, applied scientists and engi- 
neers who want to acquire solid knowledge and algorithms of subspace identification 
methods. 

I would like to express my sincere thanks to Giorgio Picci who has greatly con- 
tributed to our fruitful collaboration on stochastic realization theory and subspace 
identification methods over the last ten years. I am deeply grateful to Hideaki Sakai, 
who has read the whole manuscript carefully and provided invaluable suggestions, 
which have led to many changes in the manuscript. I am also grateful to Kiyotsugu 
Takaba and Hideyuki Tanaka for their useful comments on the manuscript. I have 
benefited from joint works with Takahira Ohki, Toshiaki Itoh, Morimasa Ogawa, 
and Hajime Ase, who told me about many problems regarding modeling and identi- 
fication of industrial processes. 

The related research from 1996 through 2004 has been sponsored by the Grant- 
in-Aid for Scientific Research, the Japan Society of Promotion of Sciences, which is 
gratefully acknowledged. 


Tohru Katayama 
Kyoto, Japan 
January 2005 
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Introduction 


In this introductory chapter, we briefly review the classical prediction error method 
(PEM) for identifying linear time-invariant (LTT) systems. We then discuss the basic 
idea of subspace methods of system identification, together with the advantages of 
subspace methods over the PEM as applied to multivariable dynamic systems. 


1.1 System Identification 


Figure 1.1 shows a schematic diagram of a dynamic system with input wu, output y 
and disturbance v. We can observe u and y but not v; we can directly manipulate 
the input wu but not y. Even if we do not know the inside structure of the system, 
the measured input and output data provide useful information about the system 
behavior. Thus, we can construct mathematical models to describe dynamics of the 
system of interest from observed input-output data. 


Figure 1.1. A system with input and disturbance 


Dynamic models for prediction and control include transfer functions, state space 
models, time-series models, which are parametrized in terms of finite number of 
parameters. Thus these dynamic models are referred to as parametric models. Also 
used are non-parametric models such as impulse responses, and frequency responses, 
spectral density functions, efc. 

System identification is a methodology developed mainly in the area of automatic 
control, by which we can choose the best model(s) from a given model set based 
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on the observed input-output data from the system. Hence, the problem of system 
identification is specified by three elements [109]: 


e A data set D obtained by input-output measurements. 
e A model set M, or a model structure, containing candidate models. 


e Acriterion, or loss, function £ to select the best model(s), or a rule to evaluate 
candidate models, based on the data. 


The input-output data D are collected through experiment. In this case, we must 
design the experiment by deciding input (or test) signals, output signals to be mea- 
sured, the sampling interval, efc., thereby systems characteristics are well reflected 
in the observed data. Thus, to obtain useful data for system identification, we should 
have some a priori information, or some physical knowledge, about the system. Also, 
there are cases where we cannot perform open-loop experiments due to safety, some 
technical and/or economic reasons, so that we can use data only measured under 
normal operating conditions. 

A choice of model set M is a difficult issue in system identification, but usu- 
ally several class of discrete-time linear time-invariant (LTI) systems are used. Since 
these models do not necessarily reflect the knowledge about the structure of the sys- 
tem, they are referred to as black-box models. One of the most difficult problems is to 
find a good model structure, or to fix orders of the models, based on the given input- 
output data. A solution to this problem is given by the Akaike information criterion 
(AIC) [3]. 

Also, by using some physical principles, we can construct models that contain 
several unknown parameters. These models are called gray-box models because 
some basic laws from physics are employed to describe the dynamics of a system 
or a phenomenon. 

The next step is to find a model in the model set IM, by which the experimental 
data is best explained. To this end, we need a criterion to measure the distance be- 
tween a model and a real system, so that the criterion should be of physical meaning 
and simple enough to be handled mathematically. In terms of the input u, the output 
y of areal system, and the model output yj, the criterion is usually defined as 


N-1 
Vn = ys I(y(t), ym(t), u(t)) 
t=0 


where [(-) is a nonnegative loss function, and N the number of data. If the model 
set is parametrized as M = {M., a € A}, then the identification in narrow sense 
reduces to an optimization problem minimizing the criterion Vy with respect to a. 

Given three basic elements in system identification, we can in principle find the 
best model M/* € M. In this case, we need 


e A condition for the existence of a model that minimizes the criterion. 
e An algorithm of computing models. 


e A method of model validation. 
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In particular, model validation is to determine whether or not an identified model 
should be accepted as a suitable description that explains the dynamics of a system. 
Thus, model validation is based on the way in which the model is used, a priori 
information on the system, the fitness of the model to real data, etc. For example, 
if we identify the transfer function of a system, the quality of an identified model 
is evaluated based on the step response and/or the pole-zero configuration. Further- 
more, if the ultimate goal is to design a control system, then we must evaluate control 
performance of a system designed by the identified model. If the performance is not 
satisfactory, we must go back to some earlier stages of system identification, includ- 
ing the selection of model structure, or experiment design, etc. A flow diagram of 
system identification is displayed in Figure 1.2, where we see that the system identi- 
fication procedure has an inherent iterative or feedback structure. 


Experiment 
design 


Parameter Model 
estimation validation 


A priori 
knowledge 


Figure 1.2. A flow diagram of system identification [109, 145] 


Models obtained by system identification are valid under some prescribed con- 
ditions, e.g., they are valid for a certain neighborhood of working point, and also do 
not provide a physical insight into the system because parameters in the model have 
no physical meaning. It should be noted that it is engineering skills and deep insights 
into the systems, shown as a priori knowledge, that help us to construct mathemat- 
ical models based on ill-conditioned data. As shown in Figure 1.2, we cannot get 
a desired model unless we iteratively evaluate identified models by trying several 
model structures, model orders, etc. At this stage, the AIC plays a very important 
role in that it can automatically select the best model based on the given input-output 
data in the sense of maximum likelihood (ML) estimation. 

It is well known that real systems of interest are nonlinear, time-varying, and 
may contain delays, and some variables or signals of central importance may not be 
measured. It is also true that LTI systems are the simplest and most important class 
of dynamic systems used in practice and in the literature [109]. Though they are 
nothing but idealized models, our experiences show that they can well approximate 
many industrial processes. Besides, control design methods based on LTI models 
often lead to good results in many cases. Also, it should be emphasized that system 
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identification is a technique of approximating real systems by means of our models 
since there is no “true” system in practical applications [4]. 


1.2 Classical Identification Methods 
Let the “true” system be represented by 
(8) y(t) = Po(z)u(t) + vo(t) 


where Po(z) is the “true” plant, and vo (t) is the output disturbance. Suppose that we 
want to fit a stochastic single-input, single-output (SISO) LTI model (Figure 1.3) 


(M) y(t) = P(z, @)u(t) + H(z, A)e(t) 


to a given set of input-output data, where e is a white noise with mean 0 and variance 
o?, and @ € R¢@ contains all unknown parameters other than the noise variance. It 
may also be noted that the noise v includes the effect of unmeasurable disturbances, 
modeling errors, etc. 


Figure 1.3. An SISO transfer function model 


The transfer function of the plant model is usually given by 


B(z,6) byzt He +B e™ 
P = ea ROS S 
(2,9) A(z, 9) Ltayz-tt---+anz7”’ TOS 2 TN: 


where, if the plant has a delay, then the parameters 51, --- , b; with? > 1 reduce to 
zero. Also, the transfer function of the noise model is 
C(z,9) — Vt qyz +++ + ep2? 


A(z,9) = —— 1.1 
a D(z,0)) 14+d)z-14+---+d 279 ies 


where H(z, 64) is of minimal phase with H(oo, 0) = 1. 
Suppose that we have observed a sequence of input-output data. Let the input- 
output data up to time t — 1 be defined by 


Z'"! := {u(k), y(k), k=0,1,---,¢-1} 
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Then, it can be shown [109] that the one-step predicted estimate of the output y(t) 
based on Z'—! is given by 


g(t, 0) = H—'(z,0)P(z,0)u(t) + [1 — H7*(z, O)]y(t) 


Moreover, we define the one-step prediction error <(t,4) := y(t) — 9(t, @). Then, it 
can be expressed as 


e(t,0) = H~*(z, 0)[Po(z) — P(z, 0)Ju(t) + H~"(z, @)uo(t) (1.2) 


Suppose that we have a set of data Z—'. If we specify a particular value to the 
parameter 9, then from (1.2), we can obtain a sequence of prediction errors 


{e(t,0),t=0, 1,---, N—1} 


where the initial conditions {e(¢,@), t = —1,--- , —p} should be given. When we 
fit a model to the data Z—!, a principle of estimation is to select 6 that produces the 
minimum variance of prediction error. Thus the criterion function is given by 


N-1 


1 
Vn (0) => > e°(t,0) (1.3) 
t=0 


A schematic diagram of the prediction error method (PEM) is displayed in Figure 
1.4. Thus, a class of algorithms designed so that a function of prediction errors is 
minimized is commonly called the PEM. Since the performance criterion of (1.3) is 
in general a complicated function of the system parameters, the problem has to be 
solved by iterative descent methods, which may get stuck in local minima. 


+ 4+ 
O 


Plant 
ar htt) = eto 
=! 
min Vy (6) 


Figure 1.4. Prediction error method 


Example 1.1. Let H(z) = C(z)/A(z) in (1.1). Then we get the ARMAX! model of 
the form 


'ARMAX= AutoRegressive Moving Average with eXogenous input. 
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A(z)y(t) = B(z)u(t) + C(z)e(t) (1.4) 


where the unknown parameters are 0 = (a1 +++ Gn by ++: bm C1 +++ Cp)t and 
the noise variance a. Then, the one-step prediction error for the ARMAX model of 
(1.4) is expressed as 


e(t,6) = [C(z,0)|-"[AG, @y(t) — Blz, @u()] (1.5) 


Obviously, the polynomial C'(z,@) should be stable in order to get a sequence of 
prediction errors. Substituting (1.5) into (1.3) yields 


nil a Ate) Biz,6)_,.)° 
w= 5 pu) - Sep 


Thus, in this case, the PEM reduces to a nonlinear optimization problem of minimiz- 
ing the performance index Vj (@) with respect to the parameter vector 6 under the 
constraint that C'(z, 8) is stable. 


For the detailed exposition of the PEM, including a frequency domain interpre- 
tation of the PEM and the analysis of convergence of the estimate, see [109, 145]. 


1.3 Prediction Error Method for State Space Models 


Consider an innovation representation of a discrete-time LTI system of the form 


a(t +1) = Ax(t) + Bu(t) + Ke(t) (1.6a) 
y(t) = Ca(t) + Du(t) + e(t) (1.6b) 


where y € R? is the output vector, w € IR” the input vector, x € IR” the state vector, 
e € R? the innovation vector with mean zero and covariance matrix R > 0, and 
(A, B, C, D, K) are matrices of appropriate dimensions. The unknown parameters 
in the state space model are contained in these system matrices and covariance matrix 
R of the innovation process. 

Consider the application of the PEM to the multi-input multi-output (MIMO) 
model (1.6). In view of Theorem 5.2, the prediction error <(t,@) is computed by a 
linear state space model with inputs u(t), y(t) of the form 


&(¢+ 1,0) =[A(O) — K(@)Cl&(¢, 8) + B(O)u(t) + K(O)y(4) 
e(t,0) = —C&(t,0) — D@)u(t) + y(t) 
with the initial condition (0,6) = 0. Then, in terms of e(t,6), the performance 


index is given by 
N- 


A 


1 
x > ler 
t=0 
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Thus the PEM estimates are obtained by minimizing Vy (6) with respect to 6, and the 
covariance matrix R of e is estimated by computing the sample covariance matrix of 
Ae =O eae cet 

If we can evaluate the gradient OVy /00, we can in principle compute a (lo- 
cal) minimum of the criterion Viy(6) by utilizing a (conjugate) gradient method. 
Also optimization methods need canonical parametrizations and it may be difficult 
to guess a suitable canonical parametrization from the outset. Since no single con- 
tinuous parametrization covers all possible multivariable linear systems with a fixed 
McMillan degree, it may be necessary to change parametrization in the course of the 
optimization routine. 

Even if this difficulty can be tackled by using overlapping parametrizations or 
pseudo-canonical forms, sensible complications in the algorithm in general result. 
Thus the use of optimization criteria and canonical parametrizations can lead to local 
minima far from the true solution, to complicated algorithms for switching between 
canonical forms, and to numerically ill-conditioned problems due to poor identifia- 
bility, i.e., to near insensitivity of the criterion to the variations of some parameters. 
Hence it seems that the PEM method has inherent difficulties for MIMO systems. 

It is well known that for a given triplet (n,m, p), there does not exist a global 
canonical MIMO linear state space form [57,67]. But there are some interests in 
deriving a convenient parametrization for MIMO systems called an overlapping 
parametrization, or pseudo-canonical form [54, 68]. 


Example 1.2. Consider the state space model of (1.6). An MIMO observable pseudo- 
canonical form with (p = 3, m = 4, n = 9) can be given by 


SOK 
KX 
x KOK 
x xX X 
k=|xxx 
x x xX 
x xX Xx 
KX 
XXX 


aa 

II 
x ox~oooxXK coo 
x ox oooxXK or 
x oxXx oooxXx ro 
x oO1}X OOOx CoO 
x OX oocorF!;x COO 
xX OFX OroaoxXxaoo 
x OX RF ooxXx coo 
x O|X OOOx CoO 
sliclso xX Rex oooxXxcacoo 

& 

II 
x xX|xX X X XIX X X 
x xX|xX X X XIX X X 
x X|xX X X XIX X X 
x xX|xX X X XIX X X 


x XxX xX X 
x XxX X 
x xX X X 


ov 
lI 
x 


where x indicates independent parameters. See Appendix C, where an overlapping 
parametrization is derived for a stochastic system. 

The pair (C’, A) is observable by definition, but the reachability of pairs (A, B) 
and (A, A’) depends on the actual values of parameters. We see that A has pn inde- 
pendent parameters, and all the elements in (B, D, K) are independent parameters, 
but the parameters in C’ are fixed. Thus the number of unknown parameters in the 
overlapping parametrization above is Noviap = n(2p +m) + pm. On the other hand, 
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the total number of parameters in (A, B, C, D, K) is Np =n? +n(2p+m)+pm, 
so that we can save n” parameters by using the above overlapping parametrization. 


Recently, data driven local coordinates, which is closely related to the overlap- 
ping parametrizations, have been introduced in McKelvey ef al. [114]. 


1.4 Subspace Methods of System Identification 


In this section, we glance at some basic ideas in subspace identification methods. For 
more detail, see Chapter 6. 


Basic Idea of Subspace Methods 


Subspace identification methods are based on the following idea. Suppose that 
an estimate of a sequence of state vectors of the state space model of (1.6) are 
somehow constructed from the observed input-output data (see below). Then, for 
¢=0,1,---,N —1, we have 


eo aloe a 


where z € IR” is the estimate of state vector, u € IR” the input, y € IR? the output, 
and 7, v are residuals. It may be noted that since all the variables are given, (1.7) is a 


is B E R(tP)x(n+™) Thus the 


regression model for system parameters O := 


least-squares estimate of O is given by 


N-17_ N-1 7 _ 
0= bs a ee )) (= ey | EO 0) 


t=0 t=0 


This class of approaches are called the direct N4SID methods [175]. We see that this 
estimate uniquely exists if the rank condition 


2(0) #0) + AN -1)] _ 
cous be aed) tN 1 mcs 


is satisfied. This condition, discussed some 30 years ago by Gopinath [62], plays an 
important role in subspace identification as well; see Section 6.3. 
Moreover, the covariance matrices of the residuals are given by 


Ei n| = aoe ke [n° vO) 


t=0 


SI 
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Thus, by solving a certain algebraic Riccati equation, we can derive a steady state 
Kalman filter (or an innovation model) of the form 


Hs 5."] [8 BILE] EE] 


where K is the steady state Kalman gain, ¢ is the estimate of state vector, and é is 
the estimate of innovation process. 


Computation of State Vectors 


We explain how we compute the estimate of state vectors by the LQ decomposition; 
this is a basic technique in subspace identification methods (Section 6.6). Suppose 
that we have an input-output data from an LTI system. Let the block Hankel matrices 
be defined by 


a) a “8 see 1) 
ee u(1) 2 ) ‘ u(N) < Rix 
CAR i NARA D) 
and 
90) yes 9 = 1) 
(1) (2) y(N) ere 
Yoje-1 = € R® 


y(k — 1) y(k) «+» y(N +k — 2) 
where & > n and N is sufficiently large. 

For notational convenience, let p and f denote the past and future, respectively. 
Then, we define the past data as Up := Uoj,_1 and Y, := Yojx—1. Similarly, we de- 
fine the future data as Up := Ugjop—1 and Yp := Ygjox—1- Let the LQ decomposition 
be given by 


Us Ri, 0 O Qt 
W, | = | Ra Ro 0 Qs 
Ye 31 Rao Raz Q3 


where Ry, € RE™XF™, Roo © Re(mtP)xk(mtP) and R33 € R*?**P are upper 
triangular, and Q;, 1 = 1, 2,3 are orthogonal matrices. Then, from Theorem 6.3, we 


see that the oblique projection of the future Y> onto the joint past W, := ie | 


P 
along the future Uy is given by 


€:= Evy, {¥s|Wp} = Raokl,W, 


where (-)' denotes the pseudo-inverse. We can show that € can be factored as a 
product of the extended observability matrix ©; and the future state vector Xf := 
[a(k) +--+ a(k+N—1)] € R"*%. It thus follows that 
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€= O,X 4 = Ryo R1,W, 


Suppose that the SVD of € be given by € = UNV" with rank(’) = n. Thus, 
we can take the extended observability matrix as 


O, =Us1/? (1.8) 


Hence, it follows that the state vector is given by Xp = OL€ = VV? 7. 
Alternatively, by using a so-called shift invariant property of the extended ob- 
servability matrix of (1.8), we can respectively compute matrices A and C as 


A=0!_,On(p+1: pk,1:n), C = Og(1: p,1:n) 


This class of approaches are called the realization-based N4SID methods [175]. For 
detail, see the MOESP method in Section 6.5. 

Summarizing, under certain assumptions, we can reconstruct the estimate of a 
sequence of state vectors and the extended observability matrix from given input- 
output data. Numerical methods of obtaining the state estimates and extended ob- 
servability matrix of LTI systems will be explained in detail in Chapter 6. Once this 
“trick” is understood, subspace identification methods in the literature can be under- 
stood without any difficulty. 


Why Subspace Methods? 


Although modern control design techniques have evolved based on the state space 
approach, the classical system identification methods have been developed in the 
input-output framework until the mid-1980s. It is quite recent that the state concept 
was introduced in system identification, thereby developing many subspace methods 
based on classical (stochastic) realization theory. 

From Figure 1.5, we see some differences in the classical and subspace meth- 
ods of system identification, where the left-hand side is the subspace method, and 
the right-hand side is the classical optimization-based method. It is interesting to ob- 
serve the difference in the flow of two approaches; in the classical method, a transfer 
function model is first identified, and then a state space model is obtained by us- 
ing some realization technique; from the state space model, we can compute state 
vectors, or the Kalman filter state vectors. In subspace methods, however, we first 
construct the state estimates from given input-output data by using a simple proce- 
dure based on tools of numerical linear algebra, and a state space model is obtained 
by solving a least-squares problem as explained above, from which we can easily 
compute a transfer matrix if necessary. Thus an important point of the study of sub- 
space methods is to understand the key point of how the Kalman filter state vectors 
and the extended observability matrix are obtained by using tools of numerical linear 
algebra. 

To recapitulate, the advantage of subspace methods, being based on reliable nu- 
merical algorithms of the QR decomposition and the SVD, is that we do not need 
(nonlinear) optimization techniques, nor do we need to impose onto the system a 
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Figure 1.5. Subspace and classical methods of system identification ( [165]) 


canonical form. This implies that subspace algorithms can equally be applicable to 
MIMO as well as SISO system identification. In other words, subspace methods do 
not suffer from the inconveniences encountered in applying PEM methods to MIMO 
system identification. 


1.5 Historical Remarks 


The origin of subspace methods may date back to multivariable statistical analy- 
sis [96], in particular to the principal component analysis (PCA) and canonical cor- 
relation analysis (CCA) due to Hotelling [74,75] developed nearly 70 years ago. It 
is, however, generally understood that the concepts of subspace methods have spread 
to the areas of signal processing and system identification with the invention of the 
MUSIC (MUltiple SIgnal Classification) algorithm due to Schmidt [140]. We can 
also observe that the MUSIC is an extension of harmonic decomposition method of 
Pisarenko [133], which is in fact closely related to the classical idea of Levin [104] 
in the mid- 1960s. For more detail, see the editorial of two special issues on Subspace 
Methods (Parts I and II) of Signal Processing [176], and also [150, 162]. 


Canonical Correlation Analysis 


Hotelling [75] has developed the CCA technique to analyze linear relations between 
two sets of random variables. The CCA has been further developed by Anderson 
[14]. The predecessor of the concept of canonical correlations is that of canonical 
angles between two subspaces; see [21]. In fact, the 2th canonical correlation p; be- 
tween two sets of random variables is related to the ith canonical angle 6; between 
two Hilbert spaces generated by them via p; = cos 6;. 
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Gel’fand and Yaglom [52] have introduced mutual information between two sta- 
tionary random processes in terms of canonical correlations of the two processes. 
Bjorck and Golub [21] have solved the canonical correlation problem by using the 
SVD. Akaike [2,3] has analyzed the structure of the information interface between 
the future and past of a stochastic process by means of the CCA, and thereby de- 
veloped a novel stochastic realization theory. Pavon [126] has studied the mutual 
information for a vector stationary process, and Desai et al. [42,43] have developed 
a theory of stochastic balanced realization by using the CCA. Also, Jonckheere and 
Helton [77] have solved the spectral reduction problem by using the CCA and ex- 
plored its relation to the Hankel norm approximation problem. 

Hannan and Poskit [68] have derived conditions under which a vector ARMA 
process has unit canonical correlations. More recently, several analytical formulas for 
computing canonical correlations between the past and future of stationary stochastic 
processes have been developed by De Cock [39]. 


Stochastic Realization 


Earlier results on the stochastic realization are due to Anderson [9] and Faurre [45]. 
Also, related spectral factorization results based on the state space methods are given 
by Anderson [7,8]. By using the deterministic realization theory together with the 
LMI and algebraic Riccati equations, Faurre [45-47] has made a fundamental contri- 
bution to the stochastic realization theory. In Akaike [1], a stochastic interpretation 
of various realization algorithms, including the algorithm of Ho and Kalman [72], is 
provided. Moreover, Aoki [15] has derived stochastic realization algorithm based on 
the CCA and deterministic realization theory. Subspace methods of identifying state 
space models have been developed by De Moor et al. [41], Larimore [100, 101] and 
Van Overschee and De Moor [163]. Lindquist and Picci [106] have analyzed state 
space identification algorithms in the light of geometric theory of stochastic realiza- 
tion. Also, the conditional canonical correlations have been defined and employed 
to develop a stochastic realization theory in the presence of exogenous inputs by 
Katayama and Picci [90]. 


Subspace Methods 


A new approach to system identification based on the QR decomposition and the 
SVD has emerged and many papers have been published in the literature in the late- 
1980s, e.g. De Moor [41], Moonen et al. [118, 119]. Then, these new techniques 
have led to a development of various subspace identification methods, including 
Verhaegen and Dewilde [172, 173], Van Overschee and De Moor [164], Picci and 
Katayama [130], etc. In 1996, a first comprehensive book on subspace identifica- 
tion of linear systems is published by Van Overschee and De Moor [165]. Moreover, 
some recent developments in the asymptotic analysis of N4SID methods are found in 
Jansson and Wahlberg [76], Bauer and Jansson [19], and Chiuso and Picci [31,32]. 
Frequency domain subspace identification methods are also developed in McKelvey 
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et al. [113] and Van Overschee et al. [166]. Among many papers on subspace iden- 
tification of continuous-time systems, we just mention Ohsumi et al. [120], which is 
based on a mathematically sound distribution approach. 


1.6 Outline of the Book 


The primary goal of this book is to provide an in-depth knowledge and algorithms 
for the subspace methods for system identification to advanced students, engineers 
and applied scientists. The plan of this book is as follows. 

Part I is devoted to reviews of some results frequently used throughout this book. 
More precisely, Chapter 2 introduces basic facts in numerical linear algebra, includ- 
ing the QR decomposition, the SVD, the projection and orthogonal projection, the 
least-squares method, the rank of Hankel matrices, etc. Some useful matrix formulas 
are given at the end of chapter as problems. 

Chapter 3 deals with the state space theory for linear discrete-time systems, 
including the reachability, observability, realization theory, and model reduction 
method, etc. 

In Chapter 4, we introduce stochastic processes, spectral analysis, and discuss 
the Wold decomposition theorem in a Hilbert space of a second-order stationary 
stochastic process. We also present a stochastic state space model, together with 
forward and backward Markov models for a stationary process. 

Chapter 5 considers the minimum variance state estimation problem based on 
the orthogonal projection, and then derives the Kalman filter algorithm and discrete- 
time Riccati equations. Also derived are forward and backward stationary Kalman 
filters, which are respectively called forward and backward innovation models for a 
stationary stochastic process. 

Part II provides a comprehensive treatment of the theories of deterministic and 
stochastic realization. In Chapter 6, we deal with the classical deterministic realiza- 
tion result due to Ho and Kalman [72] based on the SVD of Hankel matrix formed 
by impulse responses. By defining the future and past of the data, we explain how 
the LQ decomposition of the data matrix is utilized to retrieve the information about 
the extended observability matrix of a linear system. We then derive the MOESP 
method [172] and N4SID method [164, 165] in deterministic setting. The influence 
of white noise on the SVD of a wide rectangular matrix is also discussed, and some 
numerical results are included. 

Chapter 7 is addressed to the stochastic realization theory due to Faurre [46] by 
using the LMI and spectral factorization technique, and to the associated algebraic 
Riccati equation (ARE) and algebraic Riccati inequality (ARI). The positive realness 
of covariance matrices is also proved with the help of AREs. 

In Chapter 8, we present the stochastic realization theory developed by Akaike 
[2]. We discuss the predictor spaces for stationary stochastic processes. Then, based 
on the canonical correlations of the future and past of a stationary process, balanced 
and reduced stochastic realizations of Desai et al. [42,43] are derived by using the 
forward and backward Markov models. 
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Part III presents our stochastic realization results and their adaptation to sub- 
space identification methods. Chapter 9 considers a stochastic realization theory in 
the presence of an exogenous input based on Picci and Katayama [130]. We first re- 
view projections in a Hilbert space and consider feedback-free conditions between 
the joint input-output process. We then develop a state space model with a natural 
block structure of such processes based on a preliminary orthogonal decomposition 
of the output process into the deterministic and stochastic components. By adapting 
it to the finite input-output data, subspace identification algorithms, called the ORT, 
are derived based on the LQ decomposition and the SVD. 

In Chapter 10, based on Katayama and Picci [90], we consider the same stochas- 
tic realization problem treated in Chapter 9. By formulating it as a multi-stage Wiener 
prediction problem and introducing the conditional canonical correlations, we extend 
the Akaike’s stochastic realization theory to a stochastic system with an exogenous 
input, deriving a subspace stochastic identification method called the CCA method. 
Some comparative numerical studies are included. 

Chapter 11 is addressed to closed-loop subspace identification problems in the 
framework of the joint input-output approach. Based on our results [87, 88], two 
methods are derived by applying the ORT and CCA methods, and some simulation 
results are included. Also, under the assumption that the system is open-loop stable, 
a simple method of identifying the plant, controller and the noise model based on the 
ORT method is presented [92]. 

Finally, Appendix A reviews the classical least-squares method for linear regres- 
sion models and its relation to the LQ decomposition. Appendix B is concerned 
with input signals for system identification and the PE condition for deterministic 
as well as stationary stochastic signals. In Appendix C, we derive an overlapping 
parametrization of MIMO linear stochastic systems. Appendix D presents some of 
MATLAB® programs used for simulation studies in this book. Solutions to problems 
are also provided in Appendix E. 


1.7 Notes and References 


Among many books on system identification, we just mention Box and Jenkins [22], 
Goodwin and Payne [61], Ljung [109], Soderstrom and Stoica [145], and a recent 
book by Pintelon and Schoukens [132], which is devoted to a frequency domain 
approach. The book by Van Overschee and De Moor [165] is a first comprehen- 
sive book on subspace identification of linear systems, and there are some sections 
dealing with subspace methods in [109, 132]. Also, Mehra and Lainiotis [116], as a 
research oriented monograph, includes collections of important articles for system 
identification in the mid-1970s. 
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Preliminaries 
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Linear Algebra and Preliminaries 


In this chapter, we review some basic results in numerical linear algebra, which are 
repeatedly used in later chapters. Among others, the QR decomposition and the sin- 
gular value decomposition (SVD) are the most valuable tools in the areas of signal 
processing and system identification. 


2.1 Vectors and Matrices 


Let R be the set of real numbers, IR” the set of n-dimensional real vectors, and R’*” 
the set of m x n real matrices. The lower case letters x, y, --. denote vectors, and 
capital letters A, B, C,--. ; X, Y, Z, +--+ denote matrices. Transpositions of a 
vector 2 and a matrix A are denoted by x! and A’, respectively. The determinant of 
a square matrix A is denoted by | A], or det(A), and the trace by trace(A). 

The n x n identity matrix is denoted by I,,. If there is no confusion, we simply 
write I, deleting the subscript denoting the dimension. The inverse of a square matrix 
A is denoted by A~!. We also use A~? to denote (A~!)? = (AT)~!. A matrix 
satisfying A = A is called a symmetric matrix. If a matrix A € R™*” withm >n 
satisfies AT A = [,,, it is called an orthogonal matrix. Thus, for an orthogonal matrix 
A=[a1 a2 ++: Gy], ag ER", 1 =1,--- n, we have a} a; = 6.1, 3 =1,---, 0, 
where 0;; is the Kronecker delta defined by 


ee 
bye oe 
0, iFj 


For vectors x, y € IR”, the inner product is defined by 


Gay=2y= yi ana y"s 
i=1 


Also, for A € R"*” and a € R”, we define the quadratic form 
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x Ag = (x, Ax) 2S: Oj L iL; (2.1) 


i,j=1 


Define A = (A+ A™)/2. Then, we have 2'Ax = x! Az. Thus it is assumed without 
loss of generality that A is symmetric in defining a quadratic form. 

If ‘Ax > 0, « 4 0, then A is positive definite, and is written as A > 0. If 
x'Ax > Oholds, A is called nonnegative definite, and is written as A > 0. Moreover, 
if A — B > 0 (or > 0) holds, then we simply write A > B (or A > B). 

The basic facts for real vectors and matrices mentioned above can carry over to 
complex vectors and matrices. Let C be the set of complex numbers, C” the set of 
n-dimensional complex vectors, and C”*” the set of m x n complex matrices. The 
complex conjugate of \ € C is denoted by 4, and similarly the complex conjugate 
transpose of A = (a;;) € C™*” is denoted by AH = (a;;). We say that A € C”*” 
is Hermitian if AZ = A, and unitary if AHA =I,,. 

The inner product of x, y € C” is defined by 


gy = S- ziyi = yy 
i=l 


As in the real case, the quadratic form atAx, « € C” is defined for a Hermitian 
matrix A. We say that A is positive definite if "4x > 0,2 4 0, and nonnegative 
definite if x"Ax > 0; being positive (nonnegative) definite is written as A > 0 
(A > 0). 

The characteristic polynomial for A € R"*” is defined by 


ya(z) := det(zI — A) = 2" tayz" 1 +++ +On-12 + On (2.2) 


The n roots of y4(z) = 0 are called the eigenvalues of A. The set of eigenvalues of 
A, denoted by (A), is called the spectrum of A. The ith eigenvalue is described by 
d;(A). Since y4(z) has real coefficients, if 4 € C is an eigenvalue, so is  € C. If 
A € (A), there exists a vector v € C” satisfying 


Av = dv, v £0 


In this case, v € C” is called an eigenvector corresponding to the eigenvalue X. It 
may be noted that, since the eigenvalues are complex, the corresponding eigenvectors 
are also complex. 

Let the characteristic polynomial of A € R”*” be given by (2.2). Then the 
following matrix polynomial equation holds: 


ya(A) = A® +0, A" ++ +m 1A tant =0 (2.3) 


where the right-hand side is the zero matrix of size n x n. This result is known as 
the Cayley-Hamilton theorem. 

We see that the eigenvalues \;, 7 = 1, --- , nm of a symmetric nonnegative defi- 
nite matrix A € R”*” are nonnegative. Thus by means of an orthogonal matrix T, 
we can transform A into a diagonal form, i.e., 
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A 
A2 
7 ue ae ’ = diag(A1, r2, 064, An) 
An 
Define p; = V/A\i, 1 = 1, --- , n. Then we have 
M1 al 
b2 M2 
ASE ie 
Ln Ln 
Also, let B be given by 
M1 
b2 x 
B= T 


Hn 


Then it follows that A = B™B, so that B is called a square root matrix of A, and is 
written as VA or A‘/?. For any orthogonal matrix Q, we see that B; = QB satisfies 
A = BB, so that B, is also a square root matrix of A, showing that a square root 
matrix is not unique. 

Suppose that A = (a;;) € R”*”. Then, A(p: q,r : s) denotes the submatrix of 
A formed by p, p+1,--: , grows andr, r+1,--- , s columns, e.g., 


| 228 24 425 azo | 
A(2:4;3:6) = | @33 G34 G35 36 
ee 44 G45 te | 


In particular, A(p : q,:) means the submatrix formed by p, p+ 1,--- , g rows, and 
similarly A(:,7r : s) the submatrix formed by r, r + 1,--- , s columns. Also, A(i,:) 
and A(:,7) respectively represent the ith row and jth column of A. 


2.2 Subspaces and Linear Independence 


In the following, we assume that scalars are real; but all the results are extended to 
complex scalars. 

Consider a set V which is not void. For x,y € V, and for a scalar a € R, the 
sum x + y and product ax are defined. Suppose that the set V satisfies the axiom 
of “linear space” with respect to the addition and product defined above. Then V is 
called a linear space over R. The set of n-dimensional vectors R” and C” are linear 
spaces over R and C, respectively. 
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Suppose that W be a subset of a linear space V. For any wi, wo € W and ay, 
a2 ER, if ayw, + agwe € W holds, then W is called a subspace of ‘V, and this fact 
is simply expressed as W C YV. 

For a set of vectors {1, --- , 2n} in R”, if there exist scalars a1, +--+ , @», with 
a; # 0 for at least an 7 such that 


n 
) aAjrtji = 0 
j=l 


holds, then {21, --- , &»} are called linearly dependent. Conversely, if we have 
se ajtzj; =O > a=: =an,=0 
j=l 


then {@1, --+ , @»} are called linearly independent. 
All the linear combinations of vectors {w1, --- , wp} in R™ form a subspace of 
R” , which is written as 


P 


W = span{wi, ++: , Wp} = S- ajw; | a1, +++, @ER 
j=l 
If {w1, --: , wp} are linearly independent, they are called a basis of the space W. 
Suppose that V is a subspace of IR. Then there exists a basis {v1, --- , va} in 
‘V such that 
V =span{v, ++: , va} 


Hence, any x € V can be expressed as a linear combination of the form 
d 
GS > Bie, B1,°:+, Bae R 
j=l 


where (1, --: , 8g are components of x with respect to the basis {u1, --- , va}. 
Choice of basis is not unique, but the number of the elements of any basis is unique. 
The number is called the dimension of V, which is denoted by dim(V). 

For a matrix A € R”*”, the image of A is defined by 


Im(A) = {y € R” | y = Az, x € R"} = AR” 


This is a subspace of IR , and is also called the range of A. If A = [ay tee An] then 
we have Im(A) = span{a,, --- , a, }. Moreover, the set of vectors mapped to zero 
are called the kernel of A, which is written as 


Ker(A) = {2 € R" | Ax = 0} 


This is also called the null space of A, a subspace of R”. 
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The rank of A € R™*” is defined by dim(Im A) and is expressed as rank(A). 
We see that rank(A) = r if and only if the maximum number of independent vec- 
tors among the column vectors a1, --- , Gn of A is r. This is also equal to the 
number of independent vectors in row vectors @t, --- , @2, of A. Thus it follows 
that rank(A) = rank(A‘). 

It can be shown that for A € R™*”, 


dim(Im A) + dim(Ker A) =n (2.4) 
Hence, ifm = n holds, the following are equivalent: 
(i) A: nonsingular (ii) Ker(A) = {0} (iti) rank(A) =n 


Suppose that 7, y € R”. If x'y = 0, or if the vectors are mutually orthogonal, 
we write « | y. If y'x = 0 holds for all  € V C R”, we say that y is orthogonal 
to V, which is written as y L V. The set of y € R” satisfying y L V is called the 
orthogonal complement, which is expressed as 


V+ ={y eR" | y'a =0, Vr € V} 


The orthogonal complement V+ is a subspace whether or not V is a subspace. 

Let V, W C R” be two subspaces. If v'w = 0 holds for any v € V and w € W, 
then we say that V and W are orthogonal, so that we write V  W. Also, the vector 
sum of V and W is defined by 


VVW=f{utw|vev,wew} 


It may be noted that this is not the union V U W of the two subspaces. Moreover, if 
VAW = {0} holds, the vector sum is called the direct sum, and is written as V+ W. 
Also, if V L W holds, then it is called the direct orthogonal sum, and is expressed as 
VEew. 

For a subspace V C R”, we have a unique decomposition 


R° =vevt (2.5) 


This implies that 2 € IR” has a unique decomposition z = v + w,v EV, w E V+. 
Let V C R” be a subspace, and A € R”*” a linear transform. Then, if 


xEV=> Arev (AVCY) 


holds, V is called an A-invariant subspace. The spaces spanned by eigenvectors, and 
Im(A), Ker(A) are all important A-invariant subspaces of R”. 


2.3 Norms of Vectors and Matrices 


Definition 2.1. A vector norm (|| - ||) has the following properties. 


22 2 Linear Algebra and Preliminaries 


(i) ||| 20; |x| =0 @ r=0 
(ii) ||Ax|] = Al |x|], A+ scalar 
(iii) ||a + y|| < ||a|| + |lyl| (triangular inequality) 


For a vector z = (#1, +++, %n)' € R”, the 2-norm (or Euclidean norm) is 
defined by 
1/2 
elle = (lea? +--+ fen)” 
and the infinity-norm is defined by 
I|a'l]0 = max(|a|, +++, |2nI) 


Since A € R™*"” can be viewed as a vector in R’”, the definition of a matrix 
norm should be compatible with that of the vector norm. The most popular matrix 
norms are the Frobenius norm and the 2-norm. The former norm is given by 


s x a}, = 4/trace(AT A) (2.6) 


i=1 j=l 


|Allr = 


The latter is called an operator norm, which is defined by 


A 
|| Allo = sup | 2\l2 
270 |l2\l2 


(2.7) 


We have the following inequalities for the above two norms: 
\|Azll2 < |JAlle Ilella, ABlla < lIAlle ||Blle, a= 2, F 


If Q is orthogonal, i.e, Q7Q = I, we have ||Qz||3 = x'Q™Qz = |la||3. Moreover, 
it follows that ||Q Alla = ||Alla for a = 2, F'. Thus we see that the 2-norm and 
Frobenius norm are invariant under orthogonal transforms. We often write the 2- 
norm of « as ||2||, suppressing the subscript. 

For a complex vector x € C”, and a complex matrix A € C”*”,, their norms are 
defined similarly to the real cases. 


Lemma 2.1. For A € R"*”, the spectral radius is defined by 
p(A) = max{]|\;(A)| | ¢ = 1,--- ,n} (2.8) 


Then, p(A) < ||Alla holds. 

Proof. Clearly, there exists an eigenvalue A for which |A| = p(A). Let Ax = Az, 
x #0. Let X := [x x ++: 2] € C™", and consider AX = XX. Then, for any 
matrix norm || - ||,, we have 


IAL «Xa = AX Ila = AX la < |JAlle [Xlla, (|X Ila #0 


and hence || = p(A) < ||Alla. 


More precisely, the above result holds for many matrix norms [73]. 
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2.4 QR Decomposition 


In order to consider the QR decomposition, we shall introduce an orthogonal trans- 
form, called the Householder transform (see Figure 2.1). 


Lemma 2.2. Consider two vectors x #4 y € R” with ||x|| = ||y||. Then there exists 
a vector u € IR” such that 


(I —2uu")x = y, \Jee|| = 1 (2.9) 
The vector u is defined uniquely up to signature by 


(See (2.10) 
lz — yl 


span {u} 


span {u}+ 
Pz=y 
Figure 2.1. Householder transform 
Proof. By using (2.10) and the fact that |||] = ||y|| and 2'y = y'az, we compute 


the left-hand side of (2.9) to get 


Ae-y(e-y)"  _,  _AXe-y)(ate—y*2) 
(z—y)T(—-y) coe-gy ay ety y 
2(@ — y)(z" 2 — y"z) 


aa 2(aT x —yT x) ee 


(I —2uu")¢ = « - 


Suppose now that a vector v € R” also satisfies the condition of this lemma. Then 
we have ||v|| = 1, and hence 


y = (I-2uu")e¢ =(I-2vv")x« => ulus) =v(v' 2), Va 


Putting x = u (or x = v) yields v'u = +1. Thus it follows that v = +u, showing 
the uniqueness of the vector wu up to signature. 


The matrix P := I — 2uu™ of Lemma 2.2, called the Householder transform, is 
symmetric and satisfies 


P? = (I —2uu')? = I —4uu™ + 4u(utu)u? = I 
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Thus it follows that P~! = P = PT", implying that P is an orthogonal transform, 
and that Px = y, Py = z hold. 

Let a, b € R” with ||a|| = ||b|]. We consider a problem of transforming the vector 
a into the vector b, of which the first element is nonzero, but the other elements are 
all zeros. More precisely, we wish to find a transform that performs the following 
reduction 


ay by 
a2 
a=|"'| 3 o=|.], tel =e =o 
An 0 
Since ||a|| = ||b||, we see that b} = +|lal|. It follows that 
ay — by 
ce a2 ~ 12 TL 
@:=a-—b= ; ; ||a||" = 2b1(b; — ay) = —2b°G 
an 


It should be noted that if the sign of b; is chosen as the same as that of a1, there 
is a possibility that |a, — b;| has a small value, so that a large relative error may arise 
in the term u!u = a@' 4. This difficulty can be simply avoided by choosing the sign 
of b; opposite to that of a1. 

We now define 


aoa hh 
[pp ape = (2.11) 


Noting that aTa = b? and ba = by, a1, we have 


(a —b)(a—b)* (a — b)(aba — b*a) 
Pa=|I+ bid a=a ie ao) =b 

Hence, by knowing @ = a — 6 and by, the vector a can be transformed into the 
vector b with the specified form. It is well known that this method is very efficient 
and reliable, since only the first component of a is modified in the computation. In 
the following, a plays the role of the vector u in the Householder transform, though 
the norm of @ is not unity. 

Now we introduce the QR decomposition, which is quite useful in numerical 
linear algebra. We assume that matrices are real, though the QR decomposition is 
applicable to complex matrices. 


Lemma 2.3. A tall rectangular matrix A € R”*", m > n is decomposed into a 
product of two matrices: 
A=QR (2.12) 


where Q € R™*” is an orthogonal matrix with Q7Q = In, and R € R"*” is an 
upper triangular matrix. The right-hand side of (2.12) is called the QR decomposi- 
tion of A. 
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Proof. The decomposition is equivalent to Q™A = R, so that Q™ is an orthogonal 
matrix that transforms a given matrix A into an upper triangular matrix. In the fol- 
lowing, we give a method of performing this transform by means of the Householder 
transforms. 

Let a) = A(:,1), the first column vector of A. By computing u“) := @ and 


yp) , we perform the following transform: 


ai aii — pi) pid) 

a 0 
Bee eae Mell See ll Sees 

Am1 Am1 0 


where b{!) = +||a“)||. According to (2.11), let 
PO .= 74+ uD (uD)P GO) Ty 


and P() 4 := A), Then we get 


HP aD a «ot 
0 all al «af! 


PHA = AM = 


0 al) aft). afd, 
Thus the first column vector of A“) is reduced to the vector B), where the column 
vectors a2, +++, Gy are subject to effects of P), But, in the transforms that follow, 
the vector 6) := AG) (:, 1) is intact, and this becomes the first column of R. 


Next we consider the transforms of the second column vector of A“). We define 
al), u’) and b@) as 


0 0 0 

as ay) — 0” p@?) 

a?) = ass) > 4? = day 2 Sper | 0 
Orn Orn : 
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We see that P(?) is an orthogonal matrix, for which all the elements of the first row 
and column are zero except for (1, 1)-element. Thus pre-multiplying AY by PO) 
yields 


of at? aly --- a\n 
Hf alt) of? 
p@) 4@) = p@)pQ)4 = 4@) = Oe iis Ge 


0 ©... @ 


where we note that the first row and column of A(?) are the same as those of A) 
due to the form of P(). 

Repeating this procedure until the nth column, we get an upper triangular matrix 
A™) of the form 


POIPOD...pMA = Ae = [F (2.13) 
Since each component P“), j = 1, --- , n is orthogonal and symmetric, we get 
RI 
A= PM p@)...PMOA™ =Q ; 
where R € R"*” is upper triangular and Q = [q1,--- , dn] € R™*” is orthogonal. 


This completes a proof of lemma. 


The QR decomposition is quite useful for computing an orthonormal basis for a 
set of vectors. In fact, it is a matrix realization of the Gram-Schmidt orthogonaliza- 
tion process. Suppose that A € R”*” and rank(A) = n. Let the QR decomposition 
of A be given by 


A=[Qa Q4] H =Q2h,~ -“Qyere” (2.14) 


Since R is nonsingular, we have Im(A) = Im(Q 4), i.e., the column vectors of Q 4 
form an orthonormal basis of Im(A), and those of Q-; forms an orthonormal basis 
of the orthogonal complement (Im A)+. 

It should, however, be noted that if rank(A) = r < n, the QR decomposi- 
tion does not necessarily gives an orthonormal basis for Im(A), since some of the 
diagonal elements of R become zero. For example, consider the following QR de- 


composition 
B 2 4 
A=[a, a2 a3) =[m @ G3] }001 
lo 0 1| 


Though we see that rank(A) = 2, it is impossible to span Im(A) by any two vectors 
from q1, q2, gg. But, for rank(A) = r < n, it is easy to modify the QR decomposi- 
tion algorithm so that the r column vectors (q1,--- ,g,) form an orthonormal basis 
of Im(A) with column pivoting; see [59]. 
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2.5 Projections and Orthogonal Projections 


Definition 2.2. Suppose that R” be given by a direct sum of subspaces V and W, 
1.é., 


R" =V+W, VAW = {0} 
Then, x € R” can be uniquely expressed as 
r=vt+u, ve, weWw (2.15) 


where v is the projection of x onto V along W, and w is the projection of x onto W 
along V. The uniqueness follows from V 1'W = {0}. 


Vv 


Figure 2.2. Oblique (or parallel) projection 


The projection is often called the oblique (or parallel) projection, see Figure 2.2. 
We write the projection operator that transforms x onto V along W as Phi Then, 


we have v = Paw (x) andw = Py (x), and hence the unique decomposition of 
(2.15) is written as 
a = Php(2) + Piv(a) 


We show that the projection is a linear operator. For 7, y € IR”, we have the 
following decompositions 


r=vt+w, y=utZ, v,uEY, w,ze€WwW 


Sinceex+y = (utu)t+(wt+z),utveEVv, w+z € W, we see that v + u is the 
oblique projection of x + y onto V along W. Hence, we have 


Phw(a +y) =v +u= Piy(2) + Privy) 


Moreover, for any a, we get ax = av + aw, av € V, aw € W, so that au is the 
oblique projection of az onto V along W, implying that 


Piww(az) =av= aPhiy (x) 


From the above, we see that the projection Pw is a linear operator on IR”, so that it 
can be expressed as a matrix. 
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Lemma 2.4. Suppose that P € R”*” is idempotent, i.e., 
P?=P (2.16) 


Then, we have 
Ker(P) = Im(I, — P) (2.17) 
and vice versa. 
Proof. Let x € Ker(P). Then, since Px = 0, we get x = (I — P)x € Im(/ — P), 
implying that Ker(P) C Im(I— P). Also, for any x € R”, we see that P(I— P)z = 
0, showing that Im(J — P) C Ker(P). This proves (2.17). Conversely, for any 
z € R", letx = (I—P)z. Then, we have x € Ker(P), so thatO0 = Px = P(I-P)z 
holds for any z € R”, implying that P? = P. 


Corollary 2.1. Suppose that (2.16) holds. Then, we have 
R” = Im(P) + Ker(P) (2.18) 


Proof. Since any x € R” can be written as ¢ = Px + (I — P)x, we see from (2.17) 
that 
R” = Im(P) V Im( — P) = Im(P) V Ker(P) (2.19) 


Now let  € Im(P) M Ker(P). Then we have x = Py, y € R” and Px = 0. From 
(2.16), we get 0 = Px = P?y = Py = x and hence Im(P) M Ker(P) = {0}. Thus 
the right-hand side of (2.19) is expressed as the direct sum. 


We now provide a necessary and sufficient condition such that P is a matrix that 
represents an oblique projection. 


Lemma 2.5. A matrix P € R"*” is the projection matrix onto Im(P) along Ker(P) 
if and only if (2.16) holds. 


Proof. We prove the necessity. Since, for any x € R", v = Px € Im(P), we have 
P(Px) = Pv = v = Pz for all x, implying that P? = P holds. Conversely, to 
prove the sufficiency, we define 


V:={uv|v= Paz, c€ R’}, W:={wlw=(-P)x, « eR} 
Since VM W = {0}, Lemma 2.4 implies that « € R” is decomposed uniquely as 
x= Pxr+(I-P)x=v+u, veEY, weWw 


From Definition 2.2, we see that P is the projection matrix onto V = Im(P) along 
W = Ker(P). 


Example 2.1. It can be shown that P € R”*” is a projection if and only if P is 
expressed as 
P=TA,T"! (2.20) 


where T is a nonsingular matrix, and A, is given by 
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A, = diag(1, «+. , 1,0, ..., 0) (2.21) 
—— 


re 


In fact, it is obvious that P of (2.20) satisfies P? = P. Conversely, suppose that 
P? = P holds. Let 


Im(P) = span{t,, --- , ty}, Ker(P) = span{t,41, +++ , tn} 


Noting thatz € Im(P) <= Pa =x and thatz € Ker(P) © Px =0, we get 


TI, 0 
Plt, ss tp tra ++ th] = [ta ss tp tr4a ee tn] E | 


From Corollary 2.1, 7 = [t, +--+ t,] is nonsingular, showing that (2.20) holds. 
Thus it follows from (2.20) that if P? = P, then rank(P) = trace(P). 


Definition 2.3. Suppose that V C R”. Then, any x € R” can uniquely be decom- 
posed as 
r=ut+w, ve, wevt (2.22) 


This is a particular case with W = V+ in Definition 2.2, and v is called the orthog- 
onal projection of x onto V. See Figure 2.3 below. 


Vv 


UV 


Figure 2.3. Orthogonal projection 


For z, y € R”, we consider the orthogonal decompositions x = v; + w, and 
y = v2 + We, where v1, vo € Vand wi, we € V+. Let P be the orthogonal 
projection onto V along V+. Then, v) = Pax, v2 = Py. Since vg L wi, v1 L we, 


(a, Py) = (v1 + w1, v2) = (v1, v2) = (v1, vo + we) 
2(Pe) =e) 


holds for any x, y, so that we have P = P'. The next lemma provides a necessary 
and sufficient condition such that P is an orthogonal projection. 


Lemma 2.6. The matrix P € R"*” is the orthogonal projection onto Im(P) if and 
only if the following two conditions hold. 


(i) P?=P (ii) P™=P (2.23) 
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Proof. (Necessity) It is clear from Lemma 2.5 that P? = P holds. The fact that 
P™ = P is already proved above. 

(Sufficiency) It follows from Lemma 2.5 that the condition (i) implies that 
P is the projection matrix onto Im(P) along Ker(P). Condition (ii) implies that 
Ker(P) = Ker(P") = (Im P)+. This means that the sufficiency part holds. 


Let A € R°*%” with rank(A) = r and Im(A) = A C R”. Let the QR decompo- 
sition of A be given by (2.14). Then, it follows that Im(Q 4) = A. Also, define 


P,=OQ1Q' ER (2.24) 


It is clear that Pe = P, and P? = P,4, so that the conditions (i) and (ii) of Lemma 
2.6 are satisfied. Therefore, if we decompose z € IR” as 


z=at+y, re, yeEAt (2.25) 


then we get x = Paz and y = (I — P,)z. Hence, P4 and I — P, are orthogonal 
projections onto A(= Im A) and A+, respectively. 


Lemma 2.7. Suppose that A is a subspace of R”. Then, for any z € R”, Paz is the 
unique vector satisfying the following 


min ||z — 2|| = ||z — P42|| 
2eA 


Proof. If z € A, then P4z = z. Now suppose that z ¢ A. For any x € A, 
we have x — Paz € A, but (I — P,)z is orthogonal to A. Thus it follows that 
a — Paz 1 (I — Pa)z. Hence, 


\|z — |? = ||(1 — Pa)z — (@ — Paz)|I? = [| — Pa)all’ + lla — Pazll? 


The right-hand side is minimized by x = P4z, which is unique. 


2.6 Singular Value Decomposition 


Though the singular value decomposition (SVD) can be applied to complex matrices, 
it is assumed here that matrices are real. 


Lemma 2.8. Suppose that the rank of A € R™*” is r < min(n,m). Then, there 
exist orthogonal matrices U € R”™*™ and V € R°*” such that 


O1 


02 
ae) Vy. eS ; (2.26) 


Bw iG 


where UTU = Im, V'V = In, and 
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O01 202 2°62 0, > Ong = = Op =9, = p=min(m, n) 


We say that 01, +++ , Op are the singular values of A, and that (2.26) is the singular 
value decomposition (SVD). 

Proof. Suppose that we know the eigenvalue decomposition of a nonnegative def- 
inite matrix. Since ATA € R”*” is nonnegative definite, it can be diagonalized by 
an orthogonal transform V € R”*”. Let the eigenvalues of ATA be given by 1, Az, 


-+, An, and let the corresponding eigenvectors be given by vj, v2, +++, Un € R”. 
Thus we have ATAv; = \;v;,i = 1, --» , n. However, since rank(A) = r, we have 
Ay > Ag Dts > Ay > Appa = +++ = An = O. Define o; = Vi, 1 = 1, 


and V = [V, V,); where 
V, = [v1 ve +++ vy], Vp = [Ura Urea +++ Un] 
It then follows that V'V = J,, and that 
AtAv; = 0? 0;, i=1-0yr (2.27a) 
ATAv; = 0, ti=rt+1,--,n (2.27b) 


Also we define U,. := AV, Sy? € R™*". We see from (2.27a) that ATAV, = V, a 
holds and 


UUg= 2, VA A Seo Ve ye ee = i (2.28) 


In other words, the column vectors in U, form a set of orthonormal basis. 
Now we choose U, € R™*(—") so that 


U =(U, U,] € R™*™ 
is an orthogonal matrix, i.e., U TU = J,,. Then it follows that 


Ul AV, UZ AV, 
Ul AV, UZ AV, 


oe 
CiAV = i 


Tr 


jal v= 


We see from (2.28) that the (1, 1)-block element of the right-hand side of the above 
equation is ¥',. From (2.27b), we get_ AV, = 0. Thus (1, 2)- and Q, 2)- block ele- 
ments are zero matrices. Also, since U,. and U,. are orthogonal, U. UT AV, ir > 00; 
implying that (2, 1)-block element is also zero matrix. Thus we have shown that 


so 


T = 
U*AV = 0 0 


=» 


This completes the proof. 


It is clear that (2.26) can be expressed as 


A=USV' =U SRV (2.29) 
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where U,. € IR”*" and V, € R”*". Note that in the following, we often write (2.29) 
as A =US,V", which is called the reduced SVD. 

Let o(A) be the set of singular values of A, and o;(A) ith singular value. As 
we can see from the above proof, the singular values of A are equal to the positive 
square roots of the eigenvalues of ATA, i.e., for A € R™*”, 


a;(A) = \/d;(ATA), z=1,--- 5 


Also, the column vectors of U, the left singular vectors of A, are the eigenvec- 
tors of AAT, and the column vectors of V, the right singular vectors of A, are the 
eigenvectors of A‘A. From (2.29), we have AV, = U,, and ATU, = V,4, so 
that the ith right singular vector and the 2th left singular vector are related by 


Av; = o;ui, Alu; = ori, t=1,---,r 


In the following, a(A) and o(A) denote the maximum and the minimum singular 
values, respectively. 


Lemma 2.9. Suppose that rank(A) = r < min(m,n). Then, the following proper- 
ties (i)~(v) hold. 


(i) Images and kernels of A and A’: 
Im(A) = Im(U,), Ker(A) = Im(V,,) 
Im(A‘) = Im(V,), Ker(A') = Im(U,) 


(ii) The dyadic decomposition of A: 


Tr 


A= S- ojujv; (=USV") 


(iii) The Frobenius norm and 2-norm: 


|Alle = \foz +--+ 02, | Allo = 01 


(iv) Equivalence of norms: 
I|All2 < l|Alle < VpIlAll2, = p = min(m,n) 


(v) The approximation by a lower rank matrix: Define the matrix Ay by 
k 
A=>_ opus; , k<r 
i=1 


Then, we have rank (A,,) = k, and 


min ||A— Bllz = ||A— Axll2 = ong 
rank(B)=k 


where B € R™*”. 
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Proof. For a proof, see [59]. We prove only (v). Since 
A — Ay = Udiag(0, «+» , 0, on41, ++ » Op)V" 


we have ||A — Ag||2 = ox4i. Let B € R”*” be a matrix with rank k. Then, it 
suffices to show that || A — B||2 > op41. Let {2; € R°, i =1,--- , n — k} be or- 
thonormal vectors such that Ker(B) = span{x,, +++ , tn—z}. Define also Vi41 := 
span{v1, +++ , Ur+i}. We see that dim Ker(B) = n — k and dim(Vy4i1) = & +1. 
But Ker(B) and V;,41 are subspaces of R”, so that Ker(B) N Va41 4 {0}. 

Let z € Ker(B) 9 Vii C R” be a vector with ||z|| = 1. Then it follows that 
Bz = 0 and 


P 
Az= 2 oju;(v} z) = 3 oi (up z)uy 
i=l i=1 


Since (v} z)? < |v;||?||z||? = 1, we have 


k+1 
|A— BIB > (A - B)al? = ||Azl? = SJ oF Pz)? > oka 
i=1 


as was to be proved. 


Finding the rank of a matrix is most reliably done by the SVD. Let the SVD 
of A € R”*” be given by (2.26). Let E be a perturbation to the matrix A, and 
{G;,1 = 1, +--+ , p} be the singular values of the perturbed matrix A + EF. Then, it 
follows from [59] that 


This implies that the singular values are not very sensitive to perturbations. 
We now define a matrix B with rank r — 1 as 


B =Udiag(o1, -+- , or-1, 0, +++, 0)V" 


Then we have ||A—B||2 = o,. Thus, for any matrix B satisfying ||A— Bll. < o,, the 
rank of B is greater than or equal to r. Hence, as a “zero threshold,” if we can choose 
a number 6 < a, we can say that A has numerical rank r. Thus the smallest nonzero 
singular value plays a significant role in determining numerical rank of matrices. 


2.7 Least-Squares Method 

In this section, we consider the least-squares problem: 
min || Aa — dll, Ae R™*", bE R™ (2.31) 
rER” 


where m > n. Suppose that rank(A) = n, and let the QR decomposition of A be 
given by 
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R m xm nxn 
A=Q 0 |? QeER ; ReR 
Since the 2-norm is invariant under orthogonal transforms, we have 


R bi II b 
Jeo? = 7% 4e— =| [8] 2—[P |. ote | 


where b; € R”, bo € R”~”. Hence, it follows that 


|| Aw — ||? = || Ra — b4||° + [lb2I? 
Since the second term || ||? 
squares problem is reduced to 


in the right-hand side is independent of z, the least- 


T11 T12 *** Tin eal By 
722 *** T2n X2 Bo 
Ra = bj = . P y = ‘ 


0 


Since R is upper triangular, the solutions 7,,, Z,-1, +++, 21 are recursively computed 
by back substitution, starting from a, = Bn/Tnn. 

If the rank of A € R™*” is less than n, some of the diagonal elements of R are 
zeros, so that the solution of the least-squares problem is not unique. But, putting the 
additional condition that the norm ||2|| is minimum, we can obtain a unique solution. 
In the following, we explain a method of finding the minimum norm solution of the 
least-squares problem by means of the SVD. 


Tnn | | 2n Bn 


Lemma 2.10. Suppose that the rank of A € R™*” isr <min(m, n), and the SVD 
is given by A = USLV", where U € R™*" and V € R"*". Then, there exists a 
unique solution X satisfying the Moore-Penrose conditions: 


(i) AXA=A Giiy {AX HAx 
(ii) XAX =X (iv) (XA)T=XA 
The unique solution is given by 
XaVOoOU aA) (2.32) 
In this case, X = At is called the Moore-Penrose generalized inverse, or the pseudo- 


inverse, of A. 


Proof. [83] It is easy to see that Al of (2.32) satisfies the above four conditions. 
To prove the uniqueness, suppose that both X and Y satisfy the conditions. Then, it 
follows that 


XA RAN SCA XS AKI SAAT 
SAY UX AX =AVIX SV AX = (VAV)(AXN) 
SYVI ATX ATS yy AS yAy Sy 


as was to be proved. Note that all four conditions are used in the proof. 
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In the above lemma, if rank(A) = n, then At = (ATA)~! AT and ATA = IJ, If 
rank(A) = m, then we have At = AT(AAT)—! and AAt = Ip. 


Lemma 2.11. Suppose that the rank of A € R”*” is r <n. Then, a general 
solution of the least-squares problem 


min ||Agx — 6]|, Ae R”™*", be R” 
e2ERr 
is given by 
g=Atb+(I,-—AtA)z, VzeER" (2.33) 


Moreover, x = A'‘b is the unique minimum norm solution. 


Proof. It follows from Lemma 2.7 that the minimizing vector x should satisfy 
Ax = Pab, where P, is the orthogonal projection onto Im(A), which is given by 
UU? = AAT. Since A(Atb) = P.b, we see that 2 = A‘b is a solution of the least- 
squares problem. We now seek a general solution of the form x = Atb + y, where y 
is to be determined. Since 


Ay = A(a — Atb) = Ax — P4b = 0 
we get y € Ker(A). By using A=US,V", 

AAS VEU US avy 
Since VV = ATA is the orthogonal projection onto Im(AT) = (Ker A)+, the 
orthogonal projection onto Ker(A) is given by J, - VV = I, — ATA. Thus y € 
Ker(A) is expressed as 

y = (In — A’ A)z, zeER 
This proves (2.33). Finally, since Atb and (J,, — At.A)z are orthogonal, we get 
||el|? = || ATO? + ||n — ATA)z|? > [LATO 


where the equality holds if and only if z = 0. This completes the proof. 


Lemma 2.12. A general solution of the least-squares problem 


min ||AX — Bl|r, AER", BeR”™? 
XER"xP 


is given by 
XA RU AA, YZ eR? (2.34) 


Proof. A proof is similar to that of Lemma 2.11. 


The minimum norm solution defined by (2.33) is expressed as 


177T * ulb 
gS Ab=VISU b=), 9; 


i=1 
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This indicates that if the singular values are small, then small changes in A and 6 may 
result in large changes in the solution 7. From Lemma 2.9 (iv), ||A — A,—1||2 = or. 
Since the smallest singular value a, equals the distance from A to a set of matrices 
with ranks less than r — 1, it has the largest effect on the solution x. But, since the 
singular values are scale dependent, the normalized quantity, called the condition 
number, 

A) = [Allo At = 2 

K(A) = |[All2 - |[A"ll2 = = 

Or 


is used as the sensitivity measure of the minimum norm solution to the data. 

By definition, the condition number satisfies «(A) > 1. If «(A) is very large, 
then A is called ill-conditioned. If «(A) is not very large, we say that A is well- 
conditioned. Obviously, the condition number of any orthogonal matrix is one, so 
that orthogonal matrices are perfectly conditioned. 


2.8 Rank of Hankel Matrices 


In this section, we consider the rank of Hankel matrices [51]. We assume that the 
sequence hy, ha, --- below are real, but results are valid for complex sequences. 


Definition 2.4. Consider the infinite matrix 


Vache hanes 
ha ha hao 
H = | hy ha hs --- (2.35) 


where (i, j)-element is given by hj:;. This is called an infinite Hankel matrix, 
or Hankel operator. It should be noted that H has the same element along anti- 
diagonals. Also, define the matrix formed by the first k rows and | columns of H 
by 


hy ho hg +++ hy 
ho hg ha -s+ Ag 


A. = hg ha hs +++ haze (2.36) 


he hrzi heeo s+ Aegii 


This is called a finite Hankel matrix. 


Lemma 2.13. Consider the finite Hankel matrix H,,», of order n. Suppose that the 
first | row vectors are linearly independent, but the first |+ 1 row vectors are linearly 
dependent. Then, it follows that det Hii # 0. 


Proof. Let the first / + 1 row vectors of H;,,, be given by Ry, Ro,::: , Ri, Ri41. 
Since, from the assumption, R,,--+ , FR; are linearly independent, we see that Rj. 
is expressed as 
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l 
Ri41 = S- arRi-k+1 


k=1 
In particular, we have 
l 
hi=Doonhive, of =141, +, 040 (2.37) 
k=1 
Then the matrix formed by the first / row vectors Ri, Ro, -:+ , R; is given by 
hy ho oss Dn 


Rsslig: ha 
: oo . € R*” (2.38) 


Line 


Ai hig. ++ Aten—1 


where the rank of this matrix is /. 

Now consider the column vectors of H,,. It follows from (2.37) that all the 
column vectors are expressed as a linear combination of the / preceding column 
vectors. Hence, in particular, the (1 + 1)th column vector is linearly dependent on 
the first / column vectors. But since the matrix of (2.38) has rank J, the first 1 column 
vectors are linearly independent, showing that det Hi. 4 0. 


Example 2.2. Consider a finite symmetric Hankel matrix 


ha - Nie Se 
Rise fig are tay 
Aan = . . . # 


? 


€ R°*” 
An An41 — han-1 
Define the anti-diagonal matrix (or the backward identity) 
0 1 
1 


5 — € R’*” (2.39) 


Then it is easy to see that 


hyn Mngt +++ beni 
An—1 hn as hoan—2 


In Ann = . : «i = so 
hy ho «+: In 
where the matrix 7, is called a Toeplitz matrix with t;; = hn—i+;, Le., elements 


are constant along each diagonal. Also, from J, = ee = J, | we see that J,,T isa 
Hankel matrix for any Toeplitz matrix T’ € R”. 
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Lemma 2.14. The infinite Hankel matrix of (2.35) has finite rank r if and only if 


there exist r real numbers ay, a2, +++ , a, such that 
hi= So aghin, i=r+1,r+2,--- (2.40) 
k=1 


Moreover, r is the least number with this property. 


Proof. Suppose that rank(H) = r holds. Then the first r + 1 rows Ri, Ro, -+-, 
R,+1 are linearly dependent. Hence, there exists an / (< 1) such that Ry, ---, Ry 
are linearly independent, and R),, is expressed as their linear combination 


l 


Rigi = x ap Ri—r41 
k=1 


Now consider the row vectors Ri41, Rize, ++: , Rizi41, where 2 is an arbitrary 
nonnegative integer. From the structure of H, these vectors are obtained by removing 
the first 2 elements from R;, Re, --: , Ri+1, respectively. Thus we have 


l 


Rigig1 = S- ar Rigi—k41, oa Pe ec (2.41) 
=i 


It therefore follows that any row vector of H below the (/+ 1)th row can be expressed 
in terms of a linear combination of the / preceding row vectors, and hence in terms 
of linearly independent first / row vectors. Replacing / by r in (2.41), we have (2.40). 

Conversely, suppose that (2.40) holds. Then, all the rows (columns) of H are 
expressed in terms of linear combinations of the first r rows (columns). Thus all the 
minors of H whose orders are greater than r are zero, and H has rank r at most. But 
the rank cannot be smaller than r; otherwise (2.40) is satisfied with a smaller value 
of r. This contradicts the second condition of the lemma. 


The above result is a basis for the realization theory due to Ho and Kalman [72], 
to be discussed in Chapter 3, where a matrix version of Lemma 2.14 will be proved. 


2.9 Notes and References 


e In this chapter, we have presented basic facts related to numerical linear algebra 
which will be needed in later chapters, including the QR decomposition, the or- 
thogonal and oblique projections, the SVD, the least-squares method, the rank of 
Hankel matrices. Problems at the end of chapter include some useful formulas 
and results to be used in this book. 

e Main references used are Golub and Van Loan [59], Gantmacher [51], Horn and 
Johnson [73], and Trefethen and Bau [157]. Earlier papers that have dealt with 
the issues of numerical linear algebra in system theory are [94] and [122]; see 
also the preprint book [125]. 
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e For the history of SVD and related numerical methods, see [60, 148, 165]. The 
early developments in statistics, including the least-squares and the measurement 
of uncertainties, are covered in [149]. 


2.10 Problems 


2.1 Prove the following by using the SVD, where A € R™*”, B € R"*?, 
(a) Im(A) @ Ker(AT) = R™, Im(A™T) @ Ker(A) = R” 
(b) Ker(AT) = (Im A)+, Im(A‘) = (Ker A)+ 
(c) Im(A) = AR” =Im(AA‘), AIm(B) = Im(AB) 


2.2 Prove the following matrix identities 


fen]=[o °F |[ 0 a [ote y 


~ 10 T 0 P|) DC 


OD 
_[ 1 O}fA 0 I AB 
SOARS Fi 0D = CAB \0i 7 


where it is assumed that A~! and D~! exist. 


2.3 (a) Using the above results, prove the determinant of the block matrix. 


AB 


det |6D 


| = det(A) det(D — CA~'B) 
= det(D) det(A — BD~!C) 
(b) Defining A = [,, and D = I,,, show that 
det(Im — CB) = det(I, — BC) 


(c) Prove the formulas for the inverses of block matrices 


AB). [Atha BA os: SA BA 
CD) = ang AR AS 


7 ie =H-'Bp— 
= Galt Ps b= ii Bl! 


where A := D —CA7'!B, IT := A— BD7~!C. ForC = 0, we get 


AB] _[A-! -A-1BD7! 
0D} ~| 0 Dt 


(d) Prove the matrix inversion lemma. 


[A+ BD Cl h=A =A BD + CA Bl CA 
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2.4 


2.5 


2.6 


2.7 


2.8 


2.9 
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Show without using the result of Example 2.1 that if P is idempotent (P? = P), 
then all the eigenvalues are either zero or one. 


For P € R”*”, show that the following statements are equivalent. 
(ay Pt =P 

(b) Im(P) +Im(/, — P) = R” 

(c) rank(P) +rank(/, — P) =n 


Suppose that Z = [TU] € R”*” is nonsingular, where 7) € R"*", U € 
R"*("—") | Let the inverse matrix of Z be given by 


Zi = BE Le Res Ve Ren) xn 


Then it follows that TL + UV = J,, and 


ile l= [erro] =[0 a. 


Show that P := TL is the oblique projection onto Im(T) along Ker(Z), and 
that Q := UV is the oblique projection onto Ker(L) [= Im(U)] along Im(T) [= 
Ker(V)]. 


In the above problem, define 


I, _ | -x 
wells Pele) 
Compute the projection P = TL by means of L and V. Show that (2.16) is 
satisfied if P has the following representation 


By using (2.29), prove the following. 


(a) V,V," : the orthogonal projection from R” onto Im(A‘) 

(b) V,V," : the orthogonal projection from R” onto Ker(A) 

(c) U,U; : the orthogonal projection from R™ onto Im(A) 

(d) U,U- : the orthogonal projection from R” onto Ker(AT) 
For A € R™*”, show that AT(AAT)t = At and (ATA)IAT = AT. 


2.10 Let A € R”*” with m > n. By using the SVD, show that there exist an 


orthogonal matrix Q € R™*” and a nonnegative matrix JT € R”*” such that 
A=QI. 
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Discrete-Time Linear Systems 


This chapter reviews discrete-time LTI systems and related basic results, including 
the stability, norms of signals and systems, state space equations, the Lyapunov sta- 
bility theory, reachability, and observability, etc. Moreover, we consider canonical 
structure of linear systems, balanced realization, model reduction, and realization 
theory. 


3.1 z-Transform 


Let f = (f(0), f(1), ---) be a one-sided infinite sequence, or a one-sided signal. 
Let z be the complex variable, and define 


F(z) = 50 f(k)e* (3.1) 
k=0 


It follows from the theory of power series that there exists p > 0 such that F(z) 
absolutely converges for |z| > p, but diverges for |z| < p. Then, p is called the 
radius of convergence, and p = |z| is the circle of convergence. If the power series 
in the right-hand side of (3.1) converges, F'(z) is called the one-sided z-transform of 
f. and is written as 

F(z) = 3[fl@) (3.2) 
Also, let f =(--- , f(—1), f(0), f(1), ---) be a two-sided infinite sequence, or a 
two-sided signal. Then, if 


F(z) = o f(k)z" (3.3) 


k=—oo 


does converge, then F'(z) is called a two-sided z-transform of f. If the two-sided 
transform exists, it converges in an annular domain p, < |z| < po. 

It is obvious that the one-sided z-transform is nothing but a two-sided transform 
of a sequence f with f(k) = 0, k = —1, —2, ---. Thus both transforms are 
expressed as in (3.2). 
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Lemma 3.1. Let A, r > 0. If the one-sided signal f satisfies 
|f(k)| < Ark, k=0, Lyte: 


Then the z-transform 3|f\(z) is absolutely convergent for |z| > r, and is analytic 
therein. 


Proof. The absolute convergence is clear from 


Y | Fa) os) ater ke =r r/|z| <1 
k=0 


A proof of analyticity is omitted [36]. 


Similarly, if the two-sided signal f = (--- , f(—1), f(0), f(1), ---) satisfies 


Ark, k= -1, -2,---;0<r1 <7 


then the two-sided transform F'(z) is absolutely convergent for ry < |z| < ro, and is 
analytic therein. 


Example 3.1. (a) Consider the step function defined by 


1 = (2 Tet 
l(k) = P k 0, ? 
0, k=-1, -2,-:- 


Then the z-transform of 1(k) is given by 


(b) For the (one-sided) exponential function f(k) = a*, k =0, 1, -:-, 


3LFI@ = Srate#§ = = 4, a > [al 


l—az! z-a 
k=0 


(c) Let the two-sided exponential function f be defined by 


ACD 
DP ied Dae 


where 0 < a < 1 < b. Then, the two-sided transform is given by 


3[f](z) = Sooo ts Do ots on =. a<|z|<b 
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Lemma 3.2. The inverse transform of F(z) is given by the formula 


1 
k) ===  F(z)z*""'d k= 0, 41) 3.4 
Fk) = 5B B(2)2 Ne, , #1, G.4) 
where C' denotes a closed curve containing all the poles a;,i = 1, +--+ , p of F(z). 


Thus f(k) is also obtained by 
P 
f(k) = >_ Res[F(z)z**, z= ai],  &=0, +1,--- 
i=1 


which is the sum of residues of F(z)z*~' at poles contained in C C C. 
Proof. See [98, 121]. 


Lemma 3.3. (Properties of z-transform) 


(i) (Linearity) 
3laf + Bg\(z) = a3[f](z) + 63lg\(z), a, 8: scalars 


(ii) (Time shift) Let f be a one-sided signal with f(k) = 0, k = —1, —2, ---. Let 
o be a shift operator defined by (o f)(k) = f(k +1). Then, the z-transform of 
a! f is given by 


Zig) La 0) Sly: 


3[o' f(z) = Ee 
lo Fle) 2[F(2)- > f@)27*), hae ee ee 
k=0 


It should be noted that for the two-sided case, the term consisting of finite sum 
ee f (k)z7* does not appear in the above formula. 


(iii) (Convolution) Consider the convolution of two-sided signals f and gq, i.e., 


Co 


hk) = So fOgk-D= YO fk-DoW) 


l=—oco l=—co 
Let z-transforms of f and g be absolutely convergent and respectively be given 
by 


Fi)= D0 fe" a <lel<pr 


k=—0co 


and 


Gia)= So glk)z*, ps < lel <p 


k=—0co 


Then the z-transform of h is absolutely convergent and is given by 
H(z) =F(z)G(z), p< |z| < p™ (3.5) 


where p~ := max(p1, p3) and pt := min(pe, pa). 


44 3 Discrete-Time Linear Systems 


(iv) Let the partial sum of a one-sided signal f be given by g(k) := f(0) + f() + 
-++ + f(k). Then the z-transform of g has the form 


1 
3[g] = To Fw) (3.6) 


(v) For the difference of f, i.e. g(k) := f(k) — f(k — 1), we have 
310 - oA) = A - 2) Fe) (3.7) 


Proof. See [98, 121]. 


In the following, it is necessary to consider the case where 


f=, f(-)), f(0), f(Q), ota) 


is a sequence of vectors or matrices. For a vector or matrix case, the z-transform is 
defined componentwise. For example, let A(t) = (a;;(t)), ¢ = 0, £1,---, where 
i=1,---,mandj =1,--- , n. Then, the z-transform of the matrix function A(t) 
is defined by 


) Sle) ae Slain) J 
3[AM] (2) = a : 
slam (H(z) + sfoma(bte) 


3.2 Discrete-Time LTI Systems 


G(z) 


Figure 3.1. Discrete-time LTI system 


Consider a single-input, single-output (SISO) discrete-time LTI system shown in 
Figure 3.1, where wu is the input and y the output. We assume that the system is at rest 
fort = —1, -2, ---,ie, u(t) =0, y(t) = 0, t < 0. Then the output is expressed 
as a convolution of the form 


t 
y(t) = So g(k)u(t—k), t=0,1,--- (3.8) 
k=0 

where g = (g(0), g(1), ---) is the impulse response of the system. An impulse 
response sequence satisfying g(t) = 0, t = —1, —2, --- is called physically realiz- 
able, or causal, because physical systems are causal. 
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Let the z-transform of the impulse response g be given by 
G(z)= So g(k)z*, — |zl > (3.9) 


We say that G(z) is the transfer function from u to y. Let the z-transforms of u and 
y be defined by u(z) and y(z), respectively'. It then follows from (3.8) and Lemma 
3.3 (iii) that 

y(z) = G(z)u(z) (3.10) 


Example 3.2. Consider a difference equation of the form 
y(t) + ary(t — 1) + ary(t — 2) + azy(t — 3) 
= by u(t = 1) + bou(t = 2) + bgu(t a 3) 


Taking the z-transform of the above equation under the assumption that all the initial 
values are zero, we get the transfer function of the form 


by 2” + boz + b3 
2 +a,22 +a0z2 +403 


G(z) = 


This is a rational function in z, so that G(z) is called a rational transfer function. 


Most transfer functions treated in this book are rational, so that G(z) is expressed 
as a ratio of two polynomials 


G(z) = ue): deg b(z) < deg a(z) (3.11) 


where a(z) and b(z) are polynomials in z. We say that the transfer function G(z) 
with deg b(z) < deg a(z) is proper. It should be noted that since g(t) = 0, t < 0, 
the transfer function G(z) of (3.9) is always proper. 


Definition 3.1. Consider the discrete-time LTI system with the transfer function 
G(z) shown in Figure 3.1. We say that the system is bounded-input bounded-output 
(BIBO) stable if for any bounded signal u, the output y is bounded. In this case, we 
simply say that the system is stable, or G(z) is stable. 


Theorem 3.1. The discrete-time LTI system shown in Figure 3.1 is stable if and only 
if the impulse response is absolutely summable, i.e., 


Y= |9(D)| < 00 (3.12) 
l=0 


Proof. (Sufficiency) Let u be a bounded input with |u(t)| < 1. Then it follows 
from (3.8) that 


'For simplicity, we do not use the hat notation like é(z) and §(z) in this book. 
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WO! < So lg] lut-D| < M YF |g] < 00 
1=0 1=0 


(Necessity) Suppose that the absolute sum in (3.12) diverges. Let My, k = 
1, 2, +--+ be a divergent sequence. Then, there exists a divergent sequence t;,, k 
1, 2, --+ such that Sue \g(l)| > My, k =1, 2, ---. Define & as 


od a ee g(l) > 0 
EES i g(l) <0 


Then we have 
te te 
y(te) = S gMalt. —D= So lg > Me, =1,2,-% 
1=0 1=0 


This implies that if the absolute sum of impulse response diverges, we can make the 
output diverge by using the bounded input &%, so that the system is unstable. This 
completes a proof of the theorem. 


In the following, a number A € C is called a pole of G(z) if G(A) = on. It is 
also called a zero of G(z) if G(A) = 0. 


Theorem 3.2. A discrete-time LTI system with a proper transfer function G(z) is 
stable if and only if all the poles of the transfer function lie inside the unit disk. 
Proof. Leta), --- , a, be poles of G(z). We assume for simplicity that they are 
distinct. Partial fraction expansion of the right-hand side of (3.11) yields 
G A A A 
At BV Mad P 


z 2 zZ-ay Z— Gp 


Since the right-hand side is absolutely convergent for |z| > max; |a;|, the inverse 


z-transform is given by 


g(t) = Aodio + Ai(ai)’ +--+ + Ap(ap)*, t=0,1,--: 


Now suppose that |a;| << 1,i=1, --- , p. Then we have 
foe) Pp foe) 
Yo l9@| < |4ol + $5 So IAilail’ < 00 
t=0 i=1 t=0 


Thus it follows from Theorem 3.1 that the system is stable. Conversely, suppose that 
at least one a, is outside of the unit disk. Without loss of generality, we assume that 
|ay| > Land |a;| < 1,1 = 2, --- , p. Then, it follows that 


Pp 
l9(t)| = |Arl|(ax)*] — S5 Ail |(as)*| = [Ao lbe0 

i=2 
Note that the sum of the first term in the right-hand side of the above inequality 
diverges, and that those of the second and the third terms converge. Thus, we have 
20 |g(t)| = 00, showing that the system is unstable. 
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3.3 Norms of Signals and Systems 


We begin with norms of signals. Let u(t), ¢ = 0, £1, --- be m-dimensional vectors. 
Define u = (--- , u(—1), u(0), u(1), ---) be a two-sided signal. The 2-norm of wu 
is then defined by 


llull2 = 


where || - || denotes the Euclidean norm of a vector. The set of signals u with finite 
2-norm is a Hilbert space denoted by 


In(—00, 00) = {u | |lull2 < oo} 


If the signal is one-sided, i.e., u(t) = 0, t < 0, then the space is denoted by /2[0, 00). 
For u € JIg(—oo, oo), the Fourier transform, or the two-sided z-transform, is 
defined by 


u(z) = Se u(t)z—*, z=e” 


Then, the 2-norm of u in the frequency domain is expressed as 
1 /* ; ; 1/2 1 [* ; 1/2 
lulle= (Sf wh Gwe)u(jujae) = (= [ lluCiw)|Pdu) 
27 Jn 27 Ja 


where uw! (jw) = u!(—jw) denotes the complex conjugate transpose. 

We now consider a stable discrete-time LTI system with the input u € IR” and 
the output y € R?. Let G(z) be a px m transfer matrix, and G';;(z) the (4, 7)-element 
of G'(z). Then, if all the elements G';;(z) are BIBO stable, we simply say that G(z) 
is stable. 


Definition 3.2. For a stable p x m transfer matrix G(z), two different norms can be 
defined: 


(i) Hj-norm: 


|Gl2 = (== [ trace(@(e})G(e)jdw) 


where trace [ - ] denotes the trace of a matrix. 


(ii) H,,-norm: , 
|Gllo0 = sup a[G(e”)] 
—t<wK<n 


where G|-] denotes the maximum singular value of a matrix. Also, the H.-norm 
can be expressed as 
I|Gulle lIylle 


IGllo0 = sup ———= = sup ==, uw € 1 [0, 00) 
u#0 Ilull2 uZ0 ||ul|2 


This is called the ly-induced norm. 
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Lemma 3.4. Suppose that G(z) is stable, and satisfies ||G\|.. < 00. 


(i) If u € ly(—00, 00), then the output satisfies y € l2(—0o, 00). 
(ii) Let the z-transforms of u and y be given by u(z) and y(z), respectively. The 
inner product (y, w) is expressed as 


a = — T T a dz 
Dy (k)u(k) = On] fe (z)G* (z)u(z = (3.13a) 
= = = u! (e”)G" (e” ule) dw (3.13b) 


Proof. (i) Since y = G(z)u, it follows that 
2 1 ‘ Hy jw jw 
lIvll2=5s— fy (eye) dus 
7 —T 
1 Tv 


=e i ut (el” GE (e”)G(e”)u(e?” dw 


IA 


1 is ; : 
Gl | ull (el )u(e™)dw = ||GI\5. llull3 


Thus we get ||y||2 < ||G||.ollull2 < 00. 

(ii) From item (i), if uw € I2(—o0o, 00), the inner product yu is bounded, so 
that the sum in the left-hand side of (3.13a) converges. It follows from the inversion 
formula of Lemma 3.2 that 


fore) oo T 
S> yP(kuk) = (3; i _woetS) u(k) 


k=—0oco k=—oco 


1 T 1, dz 
=7— Z)UZ ae 
an fp OME YS 


Since y(z) = G(z)u(z), we get (3.13a). Letting z = e” (-1 < w < 7m) gives 
(3.13b). 


3.4 State Space Systems 


Consider an m-input, p-output discrete-time LTI system described by 


a(t+1) = Ax(t) + Bu(t) (3.14) 
y(t) =Ca(t)+ Dut), t=0,1,--- (3.14b) 
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where z € IR” is the state vector, u € IR” the input vector, and y € R” the out- 
put vector. The matrices A € R°*", B € R"*™, C € R*", D € R*™ are 
constant. Given the initial condition x(0) and the inputs u(t), t = 0, 1, ---, we 
see that the state vectors x(t), t = 1, 2, --- are recursively obtained, and hence 
the outputs y(t), t = 0, 1,--- are determined. In the following, we simply write 
3) = (A, B, C, D) for the LTI system described by (3.14). 

By solving (3.14), 


t-1 
y(t) = CA‘x(0) + Du(t) + S- CA‘'1~* Bu(i), t=0,1,--- 
i=0 
If u(t) = 0, t = 0,1,---, the above equation reduces to 
y(t) = CA'x(0), t=0,1,::: (3.15) 


This equation is called the zero-input response. Also, if 7(0) = 0, we have 


t-1 
y(t) = Du(t) + 5° CAT!“ Bult), tt =0,1,--- (3.16) 
i=0 


which is the response due to the external input u(t), and is called the zero-state 
response. Thus the response of a linear state space system can always be expressed 
as the sum of the zero-input response and the zero-state response. 

In connection with the zero state response, we define the p x m matrices as 


D, t=0 
G, = (3.17) 
CAB, t=1,2,-° 


The (Go, Gi, ---) is called the impulse response, or the Markov parameters, of the 
LTI system © = (A, B, C, D). 
Taking the z-transform of the impulse response, we have 


G(z) := Fara =D+C(zI—A)'B (3.18) 


which is called the transfer matrix of the LTI system Y = (A, B, C, D). 

As shown in Figure 3.1, we can directly access the input and output vectors u 
and y from the outside of the system, so these vectors are called external vectors. 
Hence, the transfer matrix G(z) relating the input vector to the output vector is an 
external description of the system 5’. On the other hand, we cannot directly access 
the state vector appearing in (3.14), since it is inside the system. Thus (3.14) is called 
an internal description of the LTI system »’ with the state vector «. 

We easily observe that if an internal description of the system ©’ = (A, B, C, D) 
is given, the transfer matrix and impulse response matrices are calculated by means 
of (3.18) and (3.17), respectively. But, for a given external description G(z), there 
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exist infinitely many internal descriptions that realize the external description. In fact, 
let 7 € R”*” be an arbitrary nonsingular matrix, and define 


A=T ‘AT, BHT "sR CHCl, D 


II 
v 


(3.19) 
Then, a simple computation shows that 
G(z) =D+C(zI— A)'B 

=D+0T Gr aT" AT) 7-48 

= D+C(zI — A)"'B=G(z) 
Thus the two internal descriptions © = (A, B, C, D) and ©’ = (A, B, C, D) 
have the same external description. The LTI systems »’ and »’ that represent the same 
input-output relation are called input-output equivalent. 

This implies that models we obtain from the input-output data by using system 

identification techniques are necessarily external representations of systems. To get 


a state space model from a given external representation, we need to specify a coor- 
dinate of the state space. 


3.5 Lyapunov Stability 


Let u = 0 in (3.14). Then we have a homogeneous system 
a(t+1)= Ax(t), x(0) = x0 (3.20) 


A set {x | « = Ax} of state vectors are called the equilibrium points. It is clear that 
the origin x = 0 is an equilibrium point of (3.20). If det(I — A) 4 0, then x = 0 is 
the unique equilibrium point. 


Definition 3.3. If for any x(0) € R” the solution x(t) converges to 0, then the origin 
of the system (3.20) is asymptotically stable. In this case, we say that the system 
(3.20) is asymptotically stable. Moreover, A is simply called stable. 


We now prove the Lyapunov stability theorem. 


Theorem 3.3. The following are equivalent conditions such that the homogeneous 
system (3.20) is asymptotically stable. 


(i) The absolute values of all the eigenvalues of A are less than 1, i.e. 
|A;(A)| < 1, t=1,-+-,n (3.21) 
It may be noted that this is simply written as p(A) < 1. 
(ii) For any Q > 0, there exists a unique solution P > 0 that satisfies 
P=A'PA+Q (3.22) 


The above matrix equation is called a Lyapunov equation for a discrete-time LTI 
system. 
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Proof. (i) From x(t) = A‘x(0), we see that (3.21) is a necessary and sufficient 
condition of the asymptotic stability of (3.20). 
(ii) (Necessity) Suppose that (3.21) holds. Then, the sum 


P=) (AT/'Q4' =Q+A™ (dato) A (3.23) 
i=0 i=1 
converges. It is easy to see that P defined above is a solution of (3.22), and that 
P > Osince Q > 0. To prove the uniqueness of P, suppose that P, and P2 are two 
solutions of (3.22). Then we have 


P, — P, = A™(P, — P,)A = (AT)*(P, — Pp) A* 


Since A is stable, P, = P2 follows taking the limit k + oo. 

(Sufficiency) Suppose that the solution of (3.22) is positive definite, i.e., P > 0, 
but A is not stable. Then, there exist an eigenvalue \o and an eigenvector € € C” 
such that 

AE=Aé, |ol>1, €F0 (3.24) 


Pre-multiplying (3.22) by €" and post-multiplying by € yield 
EM PE = ERAT P AE + ENCTCE = [Dg |’ENPE + E78 QE 
Thus it follows that (|Ao|? — LéU PE + E4 QE = 0. Since |Ao| > 1, the two terms in 


the left-hand side of this equation should be zero. In particular, we have QE = 0, so 
that € = 0, a contradiction. Thus A is stable. 


3.6 Reachability and Observability 


In this section, we present basic definitions and theorems for reachability and ob- 
servability of the discrete-time LTI system ©’ = (A, B, C, D). 


Definition 3.4. Consider a discrete-time LTI system 3). If the initial state vector 
a(0) = 0 can be transferred to any state € € R” at time n, i.e., x(n) = & by means 
of a sequence of control vectors u(0), u(1), +++ , u(n— 1), then the system is called 
reachable. Also, if any state x(0) € R” can be transferred to the zero state by means 
of a sequence of control vectors, the system is called controllable. 


We simply say that (A, B) is reachable (or controllable), since the reachability 
(or controllability) is related to the pair (A, B) only. 


Theorem 3.4. The following are necessary and sufficient conditions such that the 
pair (A, B) is reachable. 


(i) Define the reachability matrix as 
@€=[B AB... A”™"'B]) Ee R*™ (3.25) 
Then rank(C@) = n holds, or Im(C) = R". 
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(ii) For any X € © rank[A — XAT B] = n holds. 


(iii) The eigenvalues of A + BK are arbitrarily assigned by a suitable choice of 
KER”, 


Proof. We prove item (i) only. By using (3.14) and (3.25), the state vector at time n 
is described by 


x(n) = A"a(0) + A”! Bu(0) +--+» + ABu(n — 2) + Bu(n — 1) 


u(n — . 
= A"x(0) + € wee 
u(0) 


Let x(0) = 0. Then, we see that the vector x(n) takes arbitrary values in R” if and 
only if item (i) holds. For items (ii) and (iii), see Kailath [80]. 


From Definition 3.4, (A, B) is controllable if and only if there exists a sequence 
of control vectors that transfers the state to zero at n. This is equivalent to 


A”x(0) € Im(C), Va(0) € R” 


Thus if A is nonsingular, the above condition is reduced to Im(C) = R”, which is 
equivalent to item (i) of Theorem 3.4. Hence if A is nonsingular, we see that the 
reachability and controllability of (A, B) are equivalent. 


Theorem 3.5. Suppose that the pair (A, B) is not reachable, and let rank (C) = 
Ne <n. Then, there exists a nonsingular matrix T such that A and B are decom- 
posed as 


TOIAT = ba a ; RS | (3.26) 
22 


where Ay, € R°°*"-, By € R°°*™, and where (Aj, Bi) are reachable. 
Proof. Fora proof, see Kailath [80]. 


Definition 3.5. We say that (A, B) is stabilizable, if there exists a matrix K € 
R™*” such that A+ BK is stable, i.e, p(A + BK) < 1. This is equivalent to the 
fact that the system & is stabilized by a state feedback controlu = Kz. 


Theorem 3.6. The following are necessary and sufficient conditions such that the 
pair (A, B) is stabilizable. 
(i) For any X € C with |A| > 1, we have rank[A — AT B] =n. 


(ii) Suppose that A and B are decomposed as in (3.26). Then, Ag2 is stable, i.e. 
p(A22) < 1 holds. 


Proof. Fora proof, see Kailath [80]. 
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We introduce the observability for the discrete-time LTI system, which is the dual 
of the reachability. 


Definition 3.6. Let u = 0 in (3.14). We say that the system is observable, if the 
initial state x(0) is completely recoverable from n output observations y(0), y(1), 
+++, y(n — 1). In this case, we say that (C, A) is observable. This is equivalent 
to the fact that if both the input and output are zero, i.e., u(t) = 0, y(t) = 0 for 
t=0, 1, +--+ ,n—1, then we can say that the initial state is x(0) = 0. 


Theorem 3.7. The following are necessary and sufficient conditions such that the 
pair (C’, A) is observable. 


(i) Define the observability matrix as 


C 


CA 
O= € IR"Px” (3.27) 


CAr-1 


Then, we have rank(0) = n, or Ker(O) = {0}. 
A-XI 
C 


(tii) All the eigenvalues of A + LC are specified arbitrarily by a suitable choice of 
Le R™*?. 


(ii) For any X € C, it follows that rank | | =n holds. 


Proof. Fora proof, see [80]. 


Theorem 3.8. Suppose that (C, A) is not observable, and define rank(O) = no < 
n. Then, there exists a nonsingular matrix T such that A and C' are decomposed as 


Ay 0 


TOIAT = 
ae Ag2 


| , cCT=[C, 0 (3.28) 


where Ay, € R*"°, Cy € RP*"° with the pair (C, Ai1) observable. 


Now we provide the definition of detectability, which is weaker than the observ- 
ability condition stated above. 


Definition 3.7. Let u = 0 in (3.14). If lim y(t) = 0 implies that jim x(t) = 0, 
— 00 0° 
then (C, A) is called detectable. 


Theorem 3.9. The following are necessary and sufficient conditions such that the 
pair (C,, A) is detectable. 


(i) There exists a matrix L € R"*? such that A+ LC is stabilized. 
A-XI 


(ii) For any X € C with |X| > 1, rank | C 


| = n holds. 


54 3 Discrete-Time Linear Systems 


(iii) Suppose that A and C are decomposed as in (3.28). Then p{Ag2) < 1 holds. 


Proof. According to Definition 3.7, we show item (iii). Define x = TZ. It then 
follows from (3.28) that 


X41 (t + 1) = Ay%1 (t) (3.29a) 

2 (t + 1) = Ao %1 (t) + Aoo%o (t) (3.29b) 
y(t) = C1e1(t) (3.29c) 

From (3.29a) and (3.29c), 
y(t) Ch 
y(t + 1) Ci A 
} = : &1(t) 
y(t+ no _ 1) Cy(Ay1)"°—4 


Since (C, Ai) is observable, the observability matrix formed by (C1, A11) has 

full rank. Thus we see that jim y(t) = 0 implies that jim %1(t) = 0. Hence it 
00 00 

suffices to consider the condition so that Z(t) converges to zero as %(t) tends to 

zero. From (3.29b) it follows that 


t-1 
2(t) = (Az2)*%2(0) + S°(A22)° 1 Aa 1 (k) 
i=0 
It can be shown that jim Zo(t) = 0 holds, if Ago is stable (see Problems 3.6 and 
—0o 


3.7). This shows that the detectability of (C, A) requires that unobservable modes 
are stable. 


Theorem 3.10. Suppose that (C, A) is detectable (observable). Then A is stable if 
and only if the Lyapunov equation 


P=A™PA+C'C (3.30) 


has a unique nonnegative (positive) definite solution P. 


Proof. (Sufficiency) If A is stable, then the solution of (3.30) is given by 


P=) /(A™)'CTCA’ 
i=0 
Since Q = CTC > 0, we have P > 0. (P > 0 if and only if (C, A) is observable.) 
For uniqueness of P, see the proof in Theorem 3.3. 


(Necessity) Suppose that A is not stable. Then, there exists an unstable eigen- 
value Ag and a nonzero vector € € C” such that 


A=, lol >1, €#0 (3.31) 
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Pre-multiplying (3.30) by €" and post-multiplying € yield 
EN PE = ERAT P AE + ENCTCE = |g PEN PE + E8C*CE 


so that we get (|Ao|? — DéU PE + EXCTCE = 0. Since |Xo| > 1, both terms in the 
left-hand side are nonnegative. Thus we have C’€ = 0, which together with (3.31) 
shows that 


AE = Xo, CE =0, \Ao| > 1, EF#0 


It follows from the item (ii) of Theorem 3.9 that this implies that (C’, A) is not 
detectable, a contradiction. This completes the proof. 


3.7 Canonical Decomposition of Linear Systems 


We consider a finite-dimensional block Hankel matrix defined by 


CB CAB CA?B.-- CA™'B 


CAB CA’B ..-. .-» OCA"B 
Ann = CA*B : € Rerxmn (3.32) 
CA™-!B... eevee CA22-2 B 


This is called the Hankel matrix associated with the system 1’ = (A, B, C, D), so 
that its elements are formed by the impulse response matrices of the discrete-time 
LTI system 2’. 

In terms of the observability matrix O of (3.25) and the reachability matrix C of 
(3.27), the block Hankel matrix is decomposed as H,,,, = OC. Thus we see that 
rank(Hy nn) = Mn < min(no, n,) <n (see Lemma 3.11 below). 

The following is the canonical decomposition theorem due to Kalman [82]. 


Theorem 3.11. (Canonical decomposition) By means of a nonsingular transform, 
the system 3) = (A, B, C, D) can be reduced to 3) = (A, B, C, D) of the form 


Lea(t + 1) 11 Az Aig Aa] | Fca(t) By 
Eeo(t + 1) = 0 22 0 Ava Zeo(t) By 
Poy = | 00 Ae eee a bees . eee 
Zzo(t + 1) 0 0 0 Aaa Zzo(t) 0 
Zea(t) 
_10 4 A 1 | Zeo(t) 
y(t) =[0 Cy 0 C4] Feat) + Du(t) (3.33b) 
Zzo(t) 


where the vector Zea(t) is reachable but not observable; %.o(t) reachable and ob- 
servable; &z5(t) not reachable and not observable; %zo(t) observable but not reach- 
able. Also, it follows that dim %.s(t) = ne — Na, dim Zeo(t) = np; dim Fza(t) = 
N—No —Ne +Nn; dim Fzo(t) = No — Nh. 
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Figure 3.2. Canonical decomposition of LTI system 


Figure 3.2 shows the canonical structure of the linear system %’, where ¥.35, 
Seo, za, Wzo respectively denote subsystems whose state vectors are Z.a(t), Zeo(t), 
&zo(t), £z0(t), and the arrows reflect the block structure of system matrices in ¥’. 

As mentioned in Section 3.4, the external description of the system » is invariant 
under nonsingular transforms, so that the transfer matrices and impulse response 
matrices of ©’ and ¥ are the same; they are given by 


G(z) = Co(zI a Ao.) ' Bo + D 


and ee 7 
Gt = Co(Ao2)' | Ba, t=1,2,--- 


Hence we see that 3) := (Ay2, By, C2, D), © and © are all equivalent. This im- 
plies that the transfer matrix of a system is related to the subsystem %’, only. In other 
words, models we obtain from the input-output data by using system identification 
techniques are necessarily those of the subsystem ¥9. 

Given a transfer matrix G(z), the system © = (A, B, C, D) is called a real- 
ization of G(z). As shown above, the realizations are not unique. A realization with 
the least dimension is referred to as a minimal realization, which is unique up to 
nonsingular transforms. In fact, we have the following theorem. 


Theorem 3.12. A triplet (A, B, C) is minimal if and only if (A, B) is reachable 
and (C, A) is observable. Moreover, if both 3, = (A,, Bi, C1, D1) and Sy = 
(Ag, Bz, Cz, Dz) are minimal realizations of G(z), then the relation of (3.19) holds 
for some nonsingular transform T. 

Proof. The first part is obvious from Theorem 3.11. We show the second part. 
Define the reachability and observability matrices as 


C= [Bi AiBy +» APB, — @y = [By Ap By «+» AS By] 
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Cy C2 
C) Ay C2 Ap 
O1 = . oy Ox = . 
CyAr CA 


Then, from the hypothesis, we have D; = Dz and 
Cy(zI — Ay)~' By = Co(zI — Ag)" Bo 
By using a series expansion of the above relation, it can easily be shown that 
C, Al By = C2 A) Bo, 1 = 0,1, 


This implies that O;C; = O22. Since ©; and C, have full rank, we define two 
matrices 
R= OCC “Soro Ol05 


It can be shown that 7,7, = [,,, implying that both 7; and T> are nonsingular 
with 7 = ‘cae Also, we have TC = @; and O27, = Oy. Therefore it follows that 


(0, A,C = 2 Axo = 0,7, | AsTiC1 


Since rank(0,) = n and rank(C;) = n, it follows that A; = TASTY. Hence, 
comparing the first block columns of Ty C3 = €, yields TBs = B,. Similarly, 
from O2T; = 01, we have C27, = C. This completes the input-output equivalence 
of (Ai, By, C;) and (Ao, Ba, C2). 


Example 3.3. Consider the transfer function of an SISO system 


byzP tte + On 
CP Gh ea Oe, 


G(z) = 


It is easy to see that both 


0 1 0 0 
B= oe WUE, Mba a Salo 
1 0 
—An —Gn-1 —ay, 1 
and 
0 —An bn 
oo 1 —An-1 bn—1 
i ; , [0 0 1] 
1 —-a, by 


are realizations of G(z). Itis clear that (A, B) is reachable and (C, A) is observable, 


and that A = AT, B = CT, C = B™ hold. 
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3.8 Balanced Realization and Model Reduction 


For a given linear system, there exist infinitely many realizations; among others, the 
balanced realization described below is quite useful in modeling and system identi- 
fication. First we give the definition of two Gramians associated with a discrete-time 
LTI system. 


Definition 3.8. Let a realization be given by (A, B, C) with A stable. Consider two 
Lyapunov equations defined by 


P= APA'+ BB" (3.34) 


and 


GEA OAC (3.35) 


Then, the solutions P and Q are respectively called reachability Gramian and ob- 
servability Gramian, where they are nonnegative definite. Also, the square roots of 
the eigenvalues of PQ are called the Hankel singular values of (A, B, C). 


Lemma 3.5. Suppose that A is stable. Then, we have 
(A, B): reachable = P>0O; (C, A): observable = Q>0 


Proof. (Necessity) Since A is stable, the solution of (3.34) is given by 


co n—-1 
P=)  A‘BBT(A™)' > )_ A‘BBT(A™)' 
1=0 i=0 


Thus, if (A, B) is reachable, P > 0 follows. 
(Sufficiency) Suppose that P is positive definite, but (A, B) is not reachable. 
Then there exist 7 € C” and \ € C such that 


Pasi. absl, Ad 
where |A| < 1. Pre-multiplying (3.34) by 7 and post-multiplying 7 yield 
Wi Pn = APA‘ +7 BB'n=|APy' Py = (1—|A)?)n"'Pn = 0 
Since 1 — |\|? 4 0, we get 7 Pn = 0, implying that P is not positive definite, a 


contradiction. This proves the first half of this lemma. The assertion for the Gramian 
Q is proved similarly. 


Definition 3.9. Let G(z) = (A, B, C, D) bea minimal realization. Then it is called 
a balanced realization if the following conditions (i) and (ii) hold. 


(i) The matrix A is stable, i.e., p(A) < 1. 
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(ii) The Gramians P and Q are equal, and diagonal, i.e., there exists a diagonal 


matrix 
O1 
02 
= ‘ 01 202 > ++: >On > 0 
On 
satisfying 
S=ADAT+ BB", L=ATSA+CTC (3.36) 
Note that 01, 02, +++ On are Hankel singular values of (A, B, C). 


Lemma 3.6. If (A, B, C, D) is a balanced realization, then the 2-norm of A, the 
maximum singular value, satisfies || A||, = @(A) < 1. Moreover, if all the elements 
of 3? are different, we have ||Al|z < 1. 

Proof. We prove the first part of the lemma. Pre-multiplying the first equation of 


(3.36) by A? and post-multiplying A, and then adding the resultant equation to the 
second equation yield 


MASA A] SSA BB ALC TO (3.37) 


Let \ > 0 be an eigenvalue of ATA, and v € R” a corresponding eigenvector. Then, 
we have ATAv = \v, v # 0. Pre-multiplying (3.37) by v? and post-multiplying v 
yield 

(2? — 1)ul Sv = —(vTAT BB! Av +0 CTC) <0 


Since v' Sv > 0, we have A? < 1. But || is a singular value of A, so that we get 
|| Allo < 1. For a proof of the latter half, see [127]. 


Partition © into 3; = diag(o1, ++: , o,) and ©) = diag(or41, +++ , On), and 
accordingly write A, B, C as 
Ai: Ae By 
A= : B= ; C=[C, C 3.38 
be Ag2 Bo [Cr Cs] 2:38) 


where A; € R"*", By € R"*™ and C, € R’*”. 


Lemma 3.7. Suppose that (A, B, C, D) is a balanced realization with A stable. 
From (3.38), we define a reduced order model 


G,(z) = (Au, Bi, C;,D) (3.39) 
Then, the following (i) ~ (iii) hold?. 
(i) The model G',(z) is stable. 
*Unlike continuous-time systems, the discrete-time model G, (z) is not balanced. If, how- 


ever, we relax the definition of balanced realization by using the Riccati inequalities rather 
than Riccati equations, then G’, (z) may be called a balanced realization [185]. 
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(ii) If 0» > O44, then G,.(z) is a minimal realization. 
(iii) Foranyr =1, +++ ,n —1, the following bound holds. 
|G(e”) — Gp (e?” loo < 2(Or41 H+ +n) (3.40) 


Proof. First we show (i). From (3.38), the first Lyapunov equation of (3.36) is 
rewritten as 


0 | — | Ar Aro} | © 0 Ai Ate os By [B BT] 
0 Sy} | Aoi Ago 0 M2] | Aoi Ave Bo dy) ee 


Thus we have 


3 = Ay D1 Al, + Aro S2 A}, + Bi BT (3.41a) 
Yq = Aza 32 Ady + Ant 51 Ad, + BoBS (3.41b) 
0 = Ay 5) Ad, + Aig 52 Ad, + Bi BI (3.41c) 


Let \ € C be a non-zero eigenvalue of Aj,, and v € C” be a corresponding 
eigenvector, i.e., Af;v = Av. Pre-multiplying (3.41a) by v™ and post-multiplying v 
yield 

(1 —|A)?)uo# Sy v = uF Ayo Sy Alu + vB, Biv (3.42) 
Since the right-hand side of the above equation is nonnegative, and since v" Yu > 0, 
we get |A| < 1. 

Now suppose that |\| = 1. Then the left-hand side of (3.42) becomes 0. But, 

noting that 2 > 0, we have 


vt Ar = 0, vi By =0 


Since v2 Ay, = Av", it follows that 


pF 0] 42 ae] =a oO, wo | Ft] =o 


But from Theorem 3.4 (ii), this contradicts the reachability of (A, B). Hence, we 
have |A| # 1, so that |A| < 1. A similar proof is also applicable to the second 
equation of (3.36). 

Now we show item (ii). From the second equation of (3.36), 


+ 0 An Az) [51 0) [Au Av Cr 
= CC. 
E | ie ro 0 9} | Aei Ase i Ct Ir C2] 
Hence, the (1, 1)-block of the above equation gives 


AL Sa AL ciples a SOO) (3.43) 


Suppose that (C',, A11) is not observable. Then, there exist an eigenvalue \ € C and 
an eigenvector uv € C” of Aj, such that 
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Ayu = Av, Civ =0 (3.44) 


We assume without loss of generality that ||v|| = 1. Pre-multiplying (3.43) by v# 
and post-multiplying uv yield 
(\Al? = 1)u Sv + vi AT Dy Aoiv =0 
Since 
a(%) < vid, vi AT Sy Aoi < || Ao10||?F (22) 


hold, we see that 
(1 = JA]? )o(21) < || Aaivl|?a( 22) 


From Lemma 3.6, it follows that ||A||2 < 1, so that the norm of any submatrix of A 
is less than 1. Hence, we get 


Ai v 
Aa 


Aiiv||? = |A 


<1 © |All? + |Aael? <1 


Since, from (3.44), ? it follows that || Asiv||?_ < 1— |A|?, implying 


that 


(1 —|AP)a(21) < (1 — [AP?)a(22) 


Since |\|? < 1, we have g(3) < G(22). But this contradicts the assumption that 
O, > 0,41. Hence we conclude that (C,, A11) is observable. Similarly, we can show 
that (Ay1, By) is reachable. 

For a proof of (iii), see [5, 71, 185]. 


Similarly to the proof of Lemma 3.7 (i), we can show that the subsystem 
(Ag2, Bo, C2) is also stable. Thus, since |A(Az2)| < 1, we see that aI — Ago is 
nonsingular, where |a| = 1, a € C. Hence we can define (A,, B,, Cp, D,) as 


A, = Ay + Aig (al — Ag2)~! Aay (3.45a) 
B, = By + Ajo(al — Ax2)~' Bo (3.45b) 
Cy = Cy + Co(al — Azz)! Any (3.45c) 
D, = D+Co(al — Ago) Be (3.45d) 


Then, we have the following lemma. 


Lemma 3.8. Suppose that (A, B, C, D) is a balanced realization. Then, we have 
the following (i) ~ (iii). 


(i) The triplet (A,, B,, C,.) defined by (3.45) satisfies the following Lyapunov 
equations 


Pad SALE RR see we APOC 


where 3}, = diag(o1,--+ ,0,). Hence (A,, B,, C,.) is a balanced realization 
with o(A,) <1. 
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(ii) If the Gramians 37, and 3g have no common elements, or 0, > Or+1, then Ap 
is stable, and G,.(z) = (A,, B,, C,, D,) is a minimal realization. 


(iii) The approximation error is the same as that of (3.40), i.e., 
|G(e") — Gp(e* Joo S Worg1 + +++ + On) 
Proof. We show (i). For simplicity, define 6 = aI — Ag2. Then by the definitions 
of A, and B,., 
J:= A, AU + B, BE 
= (Aq. + Ayo®7" Ao) 51(Ani + AtgB7! Aoi) 
+ (By + Ajg$~! By)(By + Ay2$7!Bo)# 
We show that J equals ©, by a direct calculation. Expanding the above equation 
gives 
J = AyD Aj, + Ay 51 Ag, 8 PAL, + AO! A D1 Aly 
+ AjoS71 Ao, 5, AL SHAE + B, BT + B, Blo AL 
+ Ajo@! BoB} + AypO Bo BSS" AT 
Substituting By, Be By BE, BoB and Bo Be from (3.41) into the above equation, 
we get 
J = Dy — Ayo SA}, — Arg 22Ad,8 "AL — Aya! Ago Da At, 
— Ayo S71! Ago So ALO UAL + App G1 D.6 HAL 


Collecting the terms involving ¥’y yields 
J — 3, = Ay671(1 — lal?) 5,6 HAL, = 0 


since |a| = 1. This proves the first Lyapunov equation }, = A,¥, AU + B,BE of 
this lemma. In similar fashion, we can show ©, = AUS), A, +CHC,. 
(ii) This can be proved similarly to that of (ii) of Lemma 3.7. 
(iii) This part is omitted. See references [5,71, 108, 185]. 


The reduced order model G,.(z) obtained by Lemma 3.8 is called a balanced 
reduced order model for a discrete-time LTI system Gz). The method of deriving 
G,(z) in Lemma 3.8 is called the singular perturbation approximation (SPA) method. 
It can easily be shown that G(a) = G',(a@), and hence G(1) = G,.(1). This implies 
that the reduced order model by the SPA method preserves the steady state gains of 
G,,(z) and G(z). However, this does not hold for the direct method of Lemma 3.7. 

Before concluding this section, we provide a method of computing Gramians of 
unstable systems. 
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Definition 3.10. [168, 186]. Suppose that G(z) = (A, B, C, D), where A is pos- 
sibly unstable, but has no eigenvalues on the unit circle. Then, the reachability and 
observability Gramians P and Q are respectively defined by 


P=— (e??T — A)-1 BB" (e-9°T — A™)—1d0 (3.46) 


and 


Qr 
—— (e-9°T — AT)-!CTO(e99I — A)~1d0 (3.47) 
0 


It should be noted that if A is stable, P and Q above reduce to standard Gramians 
of (3.34) and (3.35), respectively. 


Lemma 3.9. //68, 186] Suppose that (A, B) is stabilizable, and (C,, A) is de- 
tectable. Let X and Y respectively be the stabilizing solutions of the algebraic Ric- 
cati equations 


X=A'(X—XBlha t+ B'XB)'B'X)A 


i Y=A(Y -Yc"[I,+CYc"]'cy)At 
Also, define 

F=-(Imnt+B'XB)'!B'XA, WW? =(In+B'XB)" 
and 


ES =Ave Ee CYC): t.. “Viva Rey Ce) 
Then, the Gramians P and Q respectively satisfy Lyapunov equations 
P=(A+BF)P(A+ BF)! + BUIm+B'XB)'B" (3.48) 


and 


Q =(A+1LC)"Q(A+ LC) +C'TU, + CYC™)“1C (3.49) 
Proof. Consider the following right coprime factorization 
(2I — A)-1B = N(z)M~‘(z) 


where M(z) is an m x m inner matrix, satisfying M1 (z~1)M(z) = Im. Then, we 
have the following realization 


ol-[ [eB |) tems ree 


where Ap := A+ BF is stable. By using the above coprime factorization, we have 
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Qn 
P= — N(e?°)M—1(e39) M1 (e799) N* (e797) dO 
T JO 
al Qn : ; 
=s- | N(e?)N7(e~%")d0 
27 Jo 
Qn : ; 
= = (eT — Ap)-' BWW" BT" (e~#8T — AE)—1d6 
T JO 
1 Qr : 
re. (ef I — Ap)! B(Im + B™X B)—!B" (e~# 1 — AT)— 49 


S— AP BUIm + BTXB)'BT(Ap)* 
k=0 


This shows that P satisfies (3.48). Similarly, let a left coprime factorization be given 
by 

C(zI, — A)~* = M~'(z)N(z), 
where M(z) is ann x n co-inner matrix with M(z)M7(z—1) = I,. A realization 
of [N(z) M(z)] is then given by 


re ~ _|A+LC|L, L nXp Xp 
[N(z) 1n(2)| =| ve 10 ar LeR*?, VER 


Similarly, we can show that the observability Gramian Q satisfies (3.49). 


Example 3.4. Suppose that (A, B, C) are given by 


A, 0 B 
Sra B=| ap C=[Ci C] (3.50) 


where A, is stable, and A» is anti-stable. Define 
P, = A, P, Al + B, BY, Py = Ay P2 A} + BoBE 
Qi = ALQi1A1 + C7 Ch, Q2 = AZ Q2A2 + CP Ce 


We wish to show that the reachability Gramian P and the observability Gramian Q 


: _ | P, 0 {Qi 0 : 
of (A, B, C) are given by P = | 0 4 and Q = | 0 Qs | , respectively. 


From (3.46), we have 


2a 


=—]| (eI -—A)-'BB" (eI — A™)—1d0 
20 0 


1 


we a 
274 


é (2I — A)-1BB"(27'I — A?) 
|z|=1 
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where |z| = 1 denotes the unit circle. According to (3.50), partition P as 


Pur Pre 
P= 
& a 


Noting that A; is stable and Ag is anti-stable, we have 


1 dz 
Pyi=— T—A,)7'B, Bl (271 - At) 1==P 
11 nh Jie oe 1) 1), (z 1 ) 3 1 
1 dz 
Po, = — I—A,)“!B. Bi (z7'1 — AT) -1= =P 
2 = OF §_ 2) Bo Bs (z 2) , 
1 
Py = =f bi Ap Beat ea =O 
207 SJizj=1 z 


We see that the third integral is zero since the integrand is analytic in |z| > 1, and 


similarly P2; = 0. Hence we have P = LD) . That Q = en0 is proved in 
0 Py 0 Qe 


the same way. 


3.9 Realization Theory 


In this section, we prove basic realization results, which will be used in Chapter 6 to 
discuss the deterministic realization theory. 

Consider an infinite sequence Y = (¥1, Yo,---) with Y; € R°*™. Let the 
infinite matrix formed by Y;,2 = 1, 2,:-+ be given by 


Vi Ve ¥o se: 
Vo Yo Vi ues 
be ad eh om peer GAt) 


This is called an infinite block Hankel matrix. By using the shift operator 7, we 
define oY = (Ye41, Yeyo, -++) and the block Hankel matrix as 


Yr+i Yero Yrog ++: 
: Yrso Yrr3 Yepacs: 
OOS | Vici Vue Vue ces | % k=0,1,-:: (3.52) 


Definition 3.11. [f (A, B, C) satisfies 


Y; = CA*"B, i=1,2,-:- (3.53) 


then the triplet (A, B, C) is called a realization of H. 
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Let the k x 1 block submatrix appearing in the upper-left corner of the infinite 
Hankel matrix H be given by 


Y% Yo Ya. NY 
Yo Y3 Yas: VYiqa 

Ans = Y3 Ya Ys +++ Vigo (3.54) 
Viv oi sen ee me 


Moreover, we define the k-observability matrix and [-reachability matrix as 


GC 
CA 
On= 5» Cr [BAB ee AB] (3.55) 


C Ak-1 
where A € R°*", B € R°*™, C € R’*" If k > n (orl > n), then Ox (or @;) is 
called an extended observability (or reachability) matrix, and the n-reachability (or 
n-observability) matrix is simply called the reachability (or observability) matrix. 
Let @ be the smallest positive integer such that rank(C,41) = rank(C,). Then 
this value of a is called the reachability index of Y = (A, B, C). Similarly, the 


smallest positive integer 3 such that rank(Og41) = rank(Q,) is referred to as the 
observability index. 


Lemma 3.10. Jf (A, B, C) is a realization of H, then 
Ary = OnCi, k, l=1, 2,--- (3.56) 
holds, and vice versa. In this case, we have the following rank conditions. 
rank(H;,7) < max{rank(O;), rank(€;) } <n 


Proof. Equation (3.56) is clear from Definition 3.11 and (3.55). The inequalities 
above are also obvious from (3.56). 


From the canonical decomposition theorem (Theorem 3.11), if there exists a re- 
alization of H, then there exists a minimal realization. Thus, we have the following 
lemma. 


Lemma 3.11. /f (A, B, C) is a minimal realization, we have 
rank(H;1) =n, k,l=n,n+1,--- (3.57) 


Proof. Fork, ! > n, it follows that rank(0;) = n and rank(C;) = n. Hence, we 
get 
COpa he. Atcha, 


From (3.56), this implies that 
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lL, = Ol Aye} = rank(O} H,.C}) == 7), 


Thus rank(H;..) > n holds. But from Lemma 3.10, we have rank(Hz.1) < n. This 
completes the proof. 


Now we define the rank of an infinite Hankel matrix. It may be, however, noted 
that the condition cannot be checked by a finite step procedure. 


Definition 3.12. The rank of the infinite block Hankel matrix H of (3.51) is defined 
by 


rank(H) = sup rank(Hy.) 


’ 


We consider the realizability condition of an infinite impulse response sequence 


in terms of the concept of recursive sequence. Suppose that there existay, +++, Qn € 
R such that 
n 
Vatktit > On—ii¥eyi =0, &=0,1,--- (3.58) 
i=1 
holds. In this case, we say that Y = (Y,, Yo,---) is arecursive sequence of order n. 


The following theorem gives an important result for the realizability of the infinite 
block Hankel matrix, which is an extension of Lemma 2.14 to a matrix case. 


Theorem 3.13. An infinite block Hankel matrix H is realizable if and only if Y is 


recursive. 


Proof. To prove the necessity, let ¥; = CA’-'B, i = 1, 2, --- and let the charac- 
teristic polynomial of A be given by 


pa(z) = 2" +ay2™ 1 +--+ +an-12 + On 
Then, from the Cayley-Hamilton theorem, 
A” +a; A" +--+ + Qn-14 + OnI =0 
Pre-multiplying this by CA*+! and post-multiplying by B yield (3.58). 
The sufficiency will be proved by constructing a realization and then a minimal 


realization. To this end, we consider the block Hankel matrix o* H of (3.52). Let the 
n x n block submatrix appearing in the upper-left corner of o* H be given by 


Yeut Yeso ++: Yetn 
Yero Yreg +++ Yeung 


(o* H) nn = € Renxmn 


Vhin Vhin4 habe Vrt2n—1 


Also, let the block companion matrix be defined by 
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0 I, 0 0 
0 0 I, 0 
M= : : : ee : € Ren xen 
0 0 0 Ip 
Ant —An—-11p —An-2Ip —ay, I, 


It follows from (3.58) that 
Mo" Dnw= to Axa, k= 0) 1,54: 


Hence we have 
MEH =a" A \s os k=0, 1, --- (3.59) 


Since the (1, 1)-block element of (o* H) n,n is just Y,41, we get 


Bal 


Yeu = [5-04 OM" Han). |e “k= O,1,+s 
0 
For notational convenience, we define ER =[L, 0-+- 0] € R°*?”", and 
In Y, 
0 = Yo 
Em = ; E REP Ae, B = An nEm =_ . E Rerx™ 
0 Yn 


Also, define A = M and C = E,. Then, it follows that 
Vine M taaliH OA, k=0, 1, --- (3.60) 


This concludes that (A, B, C) is a non-minimal) realization with A € RP"*?”. 
We derive a minimal realization from this non-minimal realization (A, B, C). 
Define a block companion matrix 


B 0: 0 —entm | 
Im 0 +++ 0 -Qn—1Im 
N= O In::: 0 -An~2lm EReexme 
0 0+: In -ailm 
Then, similarly to the procedure of deriving (3.59), 
Hig Sle Hk: - REO. 1s (3.61) 


Hence, from (3.59) and (3.61), we have M* Han = Hy, k=0,1,---. 
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Suppose that rank(H,,,) = 7. Let the SVD of H,,,, be given by 


J, 0 


= = 
Hy» =UEV =u) 


| VT =U,5,V,) € Reexmnr 


where X’,, > 0, 37, € R"*", and UTU, =e, VV; = I[,, From Lemma 2.10, the 
pseudo-inverse of H7,,,,, is given by Hl = V,Z71U 1, so that we have Heal a 
Up and n= Vel 
By using the SVD and pseudo-inverses, Y;,41 of (3.60) is computed as 
Vegi EDM Ay gin nN He 
= BS An GH gHnwN" Be 
co See see < Rem daz oper OP 
= ED Anal! ,M*HieH! «Hoa Ern 
= BLU ML” BaaVeV. Bin 
= (BLO BU MG eV eV Be) 
Define A:= Sy? UTM Hy nV, Se? € R*", B= Sy/?V TB, € RX™, and 
C3S HIG € IR?*". Then we see that 
APS (220 MV a CULM Ges) 
=U! M Ang Ve SU! MVS 
ae OP MA aH tae 
= PUMA aNV le = Cee Me Vee 


Thus, inductively, we can show that Y,,; = CA*B, implying that (A, B, C) isa 
minimal realization with A € R”*’. 


It follows from this theorem that H is realizable if and only if the rank of H is 
finite. If rank(H) = n, then the rank of a minimal realization is also n. It may be, 
however, noted that this statement cannot be verified in an empirical way. But we 
have the following theorem in this connection. 


Theorem 3.14. Suppose that for Y;, i = 1, --- , 2n, the following conditions are 
Satisfied: 
rank(Hy) = rank(Hy41) =rank(Hnnii) =n 


Then, there exists a unique Markov sequence Y = (Y, Y2, +++) with rank n such 
that the first 2n parameters exactly equal the given Y;,1 =1,--+ , 2n. 

Proof. From rank(H,,,,) = rank(Hn+1,n), the last p rows of H,,+1,n must be 
linear combinations of the rows of H,,,,. Hence, there exist p x p matrices C;, i = 
1, +--+, n such that 


70 3 Discrete-Time Linear Systems 
Y; = C1Yj-1 + +++ + On Yj—n, go=ntl,:::,2n (3.62) 


Similarly, from rank(H,,») = rank(Hy +1), we see that the last m columns of 
AZ, +1 must be linear combinations of the columns of H,, ,,. Thus, there exist m xm 
matrices D;, 2 = 1, +++ , n such that 


Y; = ¥j-1Di +--+ + Yj-nDn, j=ntl,---,2n (3.63) 
Now we recursively define Y;, 7 = 2n + 1, --- by means of (3.62), so that we 
have an infinite sequence Y = (¥1, Yo, ---). By this construction, the rank of the 


infinite block Hankel matrix H has rank smaller than pn. We show that the rank is 
in fact n. To this end, we show that (3.63) also holds for 7 = 2n +1, 2n +2, -:-. 
From (3.62) and (3.63), for 7 > 2n 


Yj41= Se CiYj41-1 = SS C; se Yj41-i-kDk 
i=l k=1 


i=1 
= s (x: ons] Dy = So Vue Di 
k=1 \i=l k=1 


Thus the columns of H are linearly dependent on the first mn columns, and hence 
we have rank(H) = rank(H,,,.,) =n. 

Finally, the uniqueness is proved as follows. Suppose that we have two Markov 
sequences Y! and Y?. Define Y := Y!' — Y?. Then we see that the rank of Y is 
at most 2n and that the first 2n parameters are zero. Therefore, from Theorem 3.13, 
applying (3.58) with n := 2n, we have Y = 0. This completes the proof. 


3.10 Notes and References 


e After a brief review of z-transform in Section 3.1, we have introduced discrete- 
time systems and signals, together with their norms in Sections 3.2 and 3.3. Used 
are references [98, 121, 144] for systems and signals and [36] for complex func- 
tion theory. 


e In Sections 3.4 to 3.7, state-space methods are considered, including Lyapunov 
stability, reachability and observability of discrete-time LTT systems. In relation 
to the realization and system identification, the canonical decomposition theorem 
(Theorem 3.11) is theoretically most important because it tells us that the transfer 
matrix of an LTI system is related to the reachable and observable subsystem 
only. Thus the unreachable or unobservable parts of the system are irrelevant to 
system identification. The basic references are [27, 80, 185]. 


e In Section 3.8, we present balanced realization theory and model reduction tech- 
niques for discrete-time LTI systems by using [5, 108, 127, 168, 186]. It is well 
known that for continuous-time systems, a reduced-order model derived from a 
balanced realization by retaining principal modes is balanced, but this fact is no 
longer true for discrete-time systems. 
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e However, by using Lyapunov inequalities rather than Lyapunov equations, it can 
be shown in [71, 185] that reduced-order balanced models are obtained from 
higher-order balanced models. These model reduction theory and technique are 
employed in Chapter 8 to consider the theory of balanced stochastic realization 
and in Chapter 11 to compute reduced-order models in closed-loop identification 
algorithms. 

e The basic results for the realization theory treated in Section 3.9 are found in 
(72,85, 147]. The proof of Theorem 3.13 is based on [85] and the SVD technique 
due to [184], and this theorem is a basis of the classical deterministic realization 
theory to be developed in Chapter 6. 


Figure 3.3. A diagonal system with n = 3 


3.11 Problems 


3.1 Suppose that the impulse response of G'(z) is given by 


—1)*-1 
n=O k=1,2,--: 


with go = 0. Consider the stability of this system by means of Theorem 3.1. 


3.2 Find a necessary and sufficient condition such that the second-order polynomial 
f(z) := 2? +a1z + ay is stable. Note that a polynomial is called stable if all the 
roots are inside the unit circle. 

3.3 Derive a state space model for the system shown in Figure 3.3, and obtain the 
reachability condition. 


3.4 Consider a realization (A, b,c) of an SISO system with (A, 6) reachable. Show 
that A = C~! A@ and b = C~!b are given by 


0 —An 1 

é. 1 —-An-1 = 0 
A= . : b= 

1 —-QA1 0 


where € is the reachability matrix. 
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3.5 Show that (A, B) is reachable (stabilizable) if and only if (A + BK, B) is 
reachable (stabilizable). Also, (C’, A) is observable (detectable) if and only if 


(C, A + LC) is observable (detectable). 
3.6 [73] Let A € R”*”. Show that for any € > 0, there exists a constant C > 0 
such that 


where i, 7 = 1,--- ,n. Recall that p(A) is the spectral radius (see Lemma 2.1). 


3.7 Consider a discrete-time LTI system of the form 
where f(t) € R” is an exogenous input, and A € R”*” is stable. Show that if 


|| f(t) || — 0, then x(t) converges to zero as t > oo. 


3.8 Define the system matrix 


S(2) = en Hl 


Show that the following equality holds: 
rank, S(z) =n + rank,G(z) 


where rank, denotes the maximal rank for z € C; note that this rank is called 
the normal rank. 


3.9 [51] Consider the Hankel matrix H of (3.51) with scalar elements Y; = h,, 


4=1, 2,---. Then, A has finite rank if and only if the series 
hy | he 
ae oe 


R(z): 


is a rational function of z. Moreover, the rank of H is equal to the number of 
poles of R(z). 


3.10 Show that the sequence {g,, k = 1, 2, --- } in Problem 3.1 cannot have a finite 
dimensional realization. 


4 


Stochastic Processes 


This chapter is concerned with discrete-time stochastic processes and linear dynamic 
systems with random inputs. We introduce basic properties of stochastic processes, 
including stationarity, Markov and ergodic properties. We then study the spectral 
analysis of stationary stochastic processes. By defining Hilbert spaces generated by 
stationary processes, we discuss the optimal prediction problem for stationary pro- 
cesses. Finally, we turn to the study of linear stochastic systems driven by white 
noises, or Markov models, which play important roles in prediction, filtering and 
system identification. We also introduce backward Markov models for stationary 
processes. 


4.1 Stochastic Processes 


Consider a physical variable x that evolves in time in a manner governed by some 
probabilistic laws. There are many examples for these kinds of variables, including 
thermal noise in electrical circuits, radar signals, random fluctuation of ships due to 
ocean waves, temperature and pressure variations in chemical reactors, stock prices, 
waves observed in earthquakes, etc. The collection of all possible variations in time 
of any such variable is called a stochastic process, or a time series. 

To be more precise, a stochastic process is a family of real valued (or complex 
valued) time functions, implying that a stochastic process is composed of a collec- 
tion or ensemble of random variables over an index set, say, T’. Let 22 be a sample 
space appropriately defined for the experiment under consideration. Then, a stochas- 
tic process is expressed as {x(t,w), t € T}, where w € 2. For a fixed t = ti, we 
have a random variable x(t1, -) on the sample space 2. Also, if we fix w = wy, then 
x(-, w1) is a function of time called a sample function. This definition of stochastic 
process is very general, so that we usually assume a suitable statistical (or dynamic) 
model with a finite number of parameters for analyzing a random phenomenon (or 
system) of interest. 

If the index set is R' = (—oo, 00), or the interval [a, b] C R', then the process is 
called a continuous-time stochastic process. If, on the other hand, the index set is Z = 
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{t =0,+1,-::- }, we have a discrete-time stochastic process, or a time series. In this 
book, we consider discrete-time stochastic processes, so that they are expressed as 
{x(t), t =0, +1,---}, {x(¢)}, or simply x by suppressing the stochastic parameter 
we 2. 

Consider the distribution of a stochastic process {x(t)}. Let t),--- , ty be k 
time instants. Then, for a;,--- ,a, € R, the joint distribution of x(t), --- , x(tz) 
is defined by 


P{ax(t1) )< ai,°** a(te) < ar} 


=f fe Pty, +, ty (U1, +++ , Be)day +--+ drp (4.1) 


where p;,,...,4,(%1, +++ , &x) is the joint probability density function of x(t1), --- , 
x(t, ). The joint distribution of (4.1) is called a finite dimensional distribution of the 
stochastic process at t), --- , t,. The distribution of a stochastic process can be de- 
termined by all the finite distributions of (4.1). In particular, if any finite distribution 
of x is Gaussian, then the distribution of x is called Gaussian. 


Example 4.1. A stochastic process {v(t), t = 0, 1, ---} is called a white noise, if 
u(t) and v(s) are independent for any t # s, i.e., 


pi, (v(t), v(8)) = pioM))ps(W(s)), tt #8 


The white noise is conveniently used for generating various processes with different 
stochastic properties. For example, a random walk x(t) is expressed as a sum of 
white noises 


a(t) = v(1) + 0(2) +--+ + v(t), t=1,2,--- (4.2) 
with «(0) = 0. It thus follows from (4.2) that 


a(t) = x(t —1) + v(t), a(0) =0 (4.3) 


Statistical property of the random walk is considered in Example 4.3. 


4.1.1 Markov Processes 


Let {x(t), t = 0, +1, ---} be a stochastic process. We introduce the minimal o- 
algebra that makes {x(s), s < t} measurable, denoted by F; = o{x(s),s < t}. The 
o-algebra F; satisfies F:, C Fz., ty < te, and is called a filtration. It involves all the 
information carried by x(t), a(t — 1),--- 

Suppose that for a € R and tp41 > ty, we have 


Pfa(tes1) <a| Su} = P{a(tes:) <a 
= P{x(thyi) <a 


x(th)} (4.4) 
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Then we say that {(¢)} has Markov property. In terms of the conditional probability 
density functions, the Markov property is written as 


p(a(te) | e(te-a), +++, w(ti)) = p(a(te) | (ter) (4.5) 


A stochastic process with the Markov property is called a Markov process. The ran- 
dom walk x(t) in Example 4.1 is a Markov process, since for any ¢t > s > 0, 


p(2(t) | (s — 1), --- (1)) = p(a(t) | #(s — 1) 


Let t) < tg < +++ < te < tp41. Then, for a Markov process, the conditional 
probability of x(t,41) given F;, depends only on «x(¢;,), and is independent of the 
information F;,_,. Let F;,_, be the past, x(¢,) present, and a(t,41) the future. 
Then, for Markov processes, the information for the present state 2(¢,) makes the 
past and the future independent. Also, by using Bayes’ rule, the joint probability 
density function of a Markov process is expressed as 


Dati), «++ 5 @(te)) = plate) | ati), +++, e(te-1))p(a(ti), +++ (tea) 
= p(a(te) | a(te-1))pa(ti), +++, #(tr—1)) 


Continuing this procedure, we get 


k 


p(a(tr), «+ , ete) = p(@(ts)) T] pe) | e(ti-1)) 


i=2 


We therefore see that the joint probability density function of a Markov process is 
determined by the first-order probability density functions p(a(¢;)) and the transition 
probability density functions p(a(t;) | a(t;-1)). Also, since 


p(x(ts), z(ti-1)) 
p(x(ti-1)) 


we can say that the distribution of a Markov process is determined by the first- and 
second-order probability density functions p(x(t;)) and p(x(t;), x(ti-1)). 


p(x(ts) | e(ti-1)) = 


4.1.2 Means and Covariance Matrices 


Let {x(t), t =0, £1, --- } be a stochastic process. Given the distribution of {x(t)} 
of (4.1), we can compute various expectations associated with the stochastic process. 
In particular, the expectation of the product x(t¢,)---«(t,) is called the kth-order 
moment function, which is given by 


M(ti, +++, te) = E{a(ti)-+-a(te)} 


= oe PRED: icy ty Big 5 @EAOD* dae 


76 4 Stochastic Processes 


where /{-} denotes the mathematical expectation. 

In the following, we are mainly interested in the first- and second-order moment 
functions, which are respectively called the mean function and the (auto-) covariance 
function; they are written as 


Mo(t) = E{a(t)}, — Ave(t,s) = Ef{[e(t) — pe (t)][a(s) — He(s)]} 
The covariance function is also written as cov{a(t), «(s)}, and in particular, 
o7,(t) = cov{a(t), x(t)} = Ara (t,t) 
is called the variance of x(t). Stochastic processes with finite variances are called 
second-order processes. 


Example 4.2. Let {x(t), t = 0, +1, --- } be a Gaussian stochastic process. Let the 
mean and covariance functions be given by p,.(t) and a(t, s), respectively. Then, the 
joint probability density function of x(t), --- , x(t) is expressed as 


Pty ++, te (x1, ner LE) 


k 
1 1 = 
a Qn FE? exp{ -5 yi Qj (ti — Pe (ti))(@j — Me (t;))} 

ig= 
where © = (o(t;, t;)) € R*** is the covariance matrix, and is denotes the 
(i, j)-element of the inverse —'. If x(t) is a white noise, we see that © becomes a 
diagonal matrix. 


Before concluding this section, we briefly discuss vector stochastic processes. 
Let {x(¢), t = 0, +1, --- } be an n-dimensional vector process, i.e., 


po] 


= leat | 


where x;(t) are scalar stochastic processes. Then we can respectively define the mean 
vector and covariance matrix as 


[x (t) 
[x (t) = 
Hz, (t) 


and 


Ave(t,s) = Ef{[2(t) — we(t)][2(s) — He (s)]"} 


E{G_(t)41(8)} E{Gn(t)#2(s)} «- E{#n(t)En(s)} 
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where %(t) := a(t) — jt2(t). We see that the diagonal elements of A,.(t, s) are the 
covariance functions of {x;(t), # = 0, £1, ---}, and the non-diagonal elements are 
the cross-covariance functions of #;(t) and #;(t),i # j. 


4.2 Stationary Stochastic Processes 


Consider a stochastic process {x(t), t = 0, +1, --- } whose statistical properties do 
not change in time. This is roughly equivalent to saying that the future is statistically 
the same as the past, and can be expressed in terms of the joint probability density 
functions as 


Pty, +, ty (21, ea Lk) = Pty 41,.--,t,+U(£1, SORE XE), I = 0, +1, a (4.6) 


If (4.6) holds, {a(t)} is called a strongly stationary process. 
Let {x(t), ¢ = 0, 1, ---} be a strongly stationary process with the finite kth- 
order moment function. It follows from (4.6) that 


M(ti, +--+, tre) = M(ti +l,---,t. +1) 
= M(t i Fear a fe) —tr,0), 1=0, +1,-:-- (4.7) 


In particular, for the mean and covariance functions, we have 


Hx (t) = Ef{a(t)} = p20), — Ava (t, 8) = Ava (t — 8, 0) 


Thus, for a strongly stationary process with a finite second-order moment, we see 
that the mean function is constant and the covariance function depends only on the 
time difference. In this case, the covariance function is simply written as A,,(t— s) 
instead of A,,(t, $). 

Let {x(t), t = 0, 1, ---} be a second-order stochastic process. If the mean 
function is constant, and if the covariance function is characterized by the time dif- 
ference, then the process is called a weakly stationary process. Clearly, a strongly 
stationary process with a finite variance is weakly stationary; but the converse is not 
true. In fact, there are cases where the probability density function of (4.6) may not 
be a function of time difference for a second-order stationary process. However, note 
that a weakly stationary Gaussian process is strongly stationary. 


Example 4.3. (Random walk) We compute the mean and variance of the random 
walk considered in Example 4.1. Since v is a zero mean Gaussian white noise with 
unit variance, we have F{v(t)} = 0 and F{v(t)v(s)} = 6:,. Hence, the mean of 
x(t) becomes 

by (t) = B{v(1) + v(2) +--+» + v(t)} =0 


Since x(t) —a(s) = Dine 04) and a(s) = )°;,_, v(k) are independent for t > s, 
we get 


Aze(t, 8) = E{a(t)2(s)} = E{(a(t) — 2(s))2(s)} + E{(2(s))"} = 5 
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Similarly, for t < s, we get Ay. (t, s) = t. Thus the covariance function of the ran- 
dom walk is given by A,,(¢, s) = min(t, s), which is not a function of the difference 
t — s, so that the random walk is a non-stationary process. 


We now consider a second-order stationary process {x(t),t = 0, +1,---}. 
Since the mean function jz, is constant, we put z(t) := x(t) — fy. Then, without 
loss of generality, we can assume from the outset that a stationary process has zero 
mean. Moreover, being dependent only on the time difference t — s, the covariance 
function is written as 


Age(l) = Ef{a(t+Da(t)}, (=0, +1,--: (4.8) 
Lemma 4.1. The covariance function Ay, (1) has the following properties. 
(i) (Boundedness) [AveQil SAngO) = a2 2 aed, bo tes 
(ii) (Symmetry) Aga (l) = Ave(—-l), L=1, 2, --- 


(iii) (Nonnegativeness) Foranyl,, --- , In € Z; a1, +++, Gn € R, we have 


S- aj04 Ava (Li i Ip) > 0 


i,k=1 
Proof. Item (i) is proved by putting € = x(1), 7 = x(0) in the Schwartz inequality 
|E{En}|? < E{E?}E {7}. Item (ii) is obvious from stationarity, and (iii) is obtained 
from E{| 77, aja(l;)|"} > 0. 


Consider a joint process {x(t), y(t), t = 0, +1, ---} with means zero. If the 
vector process w = H is stationary, then we say that x and y are jointly stationary. 


Since the covariance matrix of w is given by 


Apa(t +1,t) Ary(t+1t) | _ a(t +1) 
iceaty aneeca| 7? [aer9] 2% Fe} 
= cov{w(t + l)w(t)} (4.9) 


the stationarity of w implies that the four covariances of (4.9) are functions of the 
time difference J only. The expectation of the product x(t + l)y(t), ie., 


Agy(l) = E {x(t + Dy(t)} (4.10) 


is called the cross-covariance function of x and y. If A,,(l) = 0 for all J, then two 
processes x and y are mutually uncorrelated or orthogonal. 
Lemma 4.2. The cross-covariance function A,,(1) has the following properties. 
(i) (Anti-symmetry) Agy(l) = Ays(—D), [=1,2,--- 
(ii) (Boundedness) |Avy(1)|? < Ava(0)Ayy(0), (= +1, +2,.-- 


Proof. (i) Obvious. (ii) This is easily proved by the Schwartz inequality. 
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4.3 Ergodic Processes 


A basic problem in analyses of stationary stochastic processes is to estimate statis- 
tical parameters from observed data. Since the parameters are related to expected 
values of some function of a stochastic process, we study the estimation problem of 
the mean of a stationary stochastic process. 

Suppose that {x(t), ¢ = 0, +1, ---} is a second-order stationary process with 
mean zero. Define a time average of the process by 


N 
S> a(t+)at), 1=0,+1,--- (4.11) 


reel) = Jim oval ; 
This quantity is also called the (auto-) covariance function. The covariance function 


of (4.8) is defined as an ensemble average, but 1°». (1) of (4.11) is defined as a time 
average for a sample process 


z=(-+, #(-1), 2(0), 2(1), --+) 


For data analysis, we deal with a time function, or a sample process, generated 
from a particular experiment rather than an ensemble. Hence, from practical points 
of view, the definition of moment functions in terms of the time average is preferable 
to the one defined by the ensemble average. But, we do not know whether the time 
average r,, (1) is equal to the ensemble average A,,.(l) or not. 

A stochastic process whose statistical properties are determined from its sample 
process is called an ergodic process. In other words, for an ergodic process, the time 
average equals the ensemble average. In the following, we state ergodic theorems for 
the mean and covariance functions. 

We first consider the ergodic theorem for the mean. Let 2 be a second-order 
stationary stochastic process with mean jz,, and consider the sample mean 


1 
m(N) = Nei > z(t) (4.12) 


Then, we see that E{m(N)} = jz, which implies that the mathematical expectation 
of m(N) is equal to the ensemble mean. Also, the variance of m(V) is given by 


E{(m(N) — pe)? } = E 


1 N 
INGI 2 elo = a) 


1 N  N 
= (2N +1)? ye » Aza(t — 8) 
t=—N s=—N 
2N 
__ 1 |k| 
IN 2 (1 ON + :) Aeo(k) G12) 


Thus we have the following theorem. 
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Theorem 4.1. (Mean ergodic theorem) A necessary and sufficient condition that 
Nim m(N) = py holds in the quadratic mean is that 
oo 


2N 


1 [A 
lim ——— i Age (k) = 4.14 
Nove ee sNyi) et Che 


Proof. For a proof, see Problem 4.2. 


We see that if lim A,,(/) = 0, then the Cesaro sum also does converge to zero, 
oo 


L.é., 


Jim an =o e}?) 


holds, and hence (4.14) follows (see Problem 4.3 (b)). 

Next we consider an ergodic theorem for a covariance function. Suppose that 
x is a stationary process with mean zero. Let the sample average of the product 
a(t + 1)x(t) be defined by 


Tox (l; N) Net 2 * (¢+1)x (4.16) 


Obviously we have 


N 


Efree(l; N)} = S- E{a(t + Da(t)} = Are(l) 
N 


2N+1, 


so that the expectation of r,, (1; N)) equals the covariance function A,,(I). 
To evaluate the variance of r,.(l; N), we define €(¢) = a(t + l)a(t), and apply 
the mean ergodic theorem to &(¢). We see that ag = E{€(t)} = A... (1) and that 


Age(k) = E{[x(t +1 + k)x(t + k) — pgl[a(t + x(t) — pe]} 
= E{a(t+1+k)x(t + k)a(t + Da(t)} -— ne (4.17) 


Also, similarly to the derivation of (4.13), it follows that 


mr Det 1) 


2 


E{[rex(l;N) — Ave ()]?} = E 


[A | 
ae y a SET) eet) 


Thus, we have an ergodic theorem for the covariance function. 
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Theorem 4.2. (Covariance ergodic theorem) A necessary and sufficient condition 
that 
lim ree(l;N) = Aga (l) (4.18) 
N-0o 


holds in the quadratic mean is that 


2N 
im > [A 
ae nt | IN +1 ee (k) 0 (4.19) 


Moreover, suppose that x is a Gaussian process with mean zero. If the condition 


: 1 2 = 
dim aay SAL eb) =O (4.20) 


is satisfied, then (4.19) and hence (4.18) holds in the quadratic mean. 
Proof. See Problem 4.4. 


Example 4.4. Consider a zero mean Gaussian process x with the covariance function 
Age(l) = call, | = 0, +1, ++» (0 < Jal < 1, 0? > 0) (see Figure 4.2 below). 
Since (4.15) and (4.20) are satisfied, Theorems 4.1 and 4.2 indicate that a stochastic 
process with exponential covariance function is ergodic. 


4.4 Spectral Analysis 


We consider a second-order stationary process {x(t),¢ = 0, +1, ---} with mean 
zero. Suppose that its covariance function {A,,(1), 1 = 0, £1, ---} satisfies the 
summability condition 
Y |Az2(DI < 00 (4.21) 
l=—co 


Definition 4.1. Suppose that the covariance function satisfies the condition (4.21). 
Then, the Fourier transform (or two-sided z-transform) of Ay (1) is defined by 


O23(2) =) Age (4.22) 


This is called the spectral density function of the stochastic process {x(t)}. 


Putting z = e/”, —a <w <7, the spectral density function can be viewed as a 
function of w (rad) 


$,.(w)= So eA), —e<w<t (4.23) 


l=—0co 
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We observe from the definition (4.23) that the spectral density function shows the 
distribution of power of the stationary process in the frequency domain. 

It is well known that the covariance function is expressed as an inverse transform 
of the spectral density function as 


a 1 I-1 
Ava(l) = oa] = By.(z)z dz (4.24a) 
a ae 
oe J! 6, (w)du, 1=0, +1,-:- (4.24b) 


The relations in (4.24) are called the Wiener-Khinchine formula. We see from (4.22) 
and (4.24a) that the covariance function and spectral density function involve the 
same information about a stationary stochastic process since there exists a one-to- 
one correspondence between them. 

If the sampling interval is given by At, then the spectral density function is de- 
fined by 


6,.(v; At) = At > e944 A, (0), or <v< = (4.25) 


l=—0co 
and its inverse is 
1 a/At : 
Ap Are= =| eJVAUG, .(v; At)dv, i=0,+1,::- (4.26) 
Qn —1/At 


It should be noted that w in (4.23) and v in (4.25) are related by w = v At, and hence 
y has the dimension [rad/sec]. 
Lemma 4.3. The spectral density function satisfies the following. 
(i) (Symmetry) — Bxx(w) =Pe2(—w),  —™<we<n 
(ii) (Nonnegativeness) P,.(w) > 0, —tI<w<t 


Proof. (i) The symmetry is immediate from A,.(1) = Az.(—l). (ii) This follows 
from the nonnegativeness of A,.(/) described in Lemma 4.1 (iii). For an alternate 
proof, see Problem 4.5. 


G(z) 
Figure 4.1. Discrete-time LTI system 
Consider a discrete-time LTI system depicted in Figure 4.1, where u is the input 


and y is the output, and the impulse response of the system is given by {g(k), k = 
0, 1, ---} with g(k) = 0, k < 0. The transfer function is then expressed as 
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G(z) = S¢ g(k)2z* (4.27) 


As mentioned in Theorem 3.1, G(z) is stable if and only if the impulse response 
sequence {g(k), k =0, 1, --- } is absolutely summable. 


Lemma 4.4. Consider an LTI system shown in Figure 4.1 with G(z) stable. Suppose 
that the input u is a zero mean second-order stationary process with the covariance 
function Ayu(D) satisfying 
S> |Auu(2)] < 00 (4.28) 
l=—oco 


Then, the output y is also a zero mean second-order process with the spectral density 
function of the form 


Byy (Zz) = G(z)G(27") @yu(z) (4.29) 
or 
Pyy(w) = |G)? Punw) (4.30) 
Further, the variance of y is given by 
2 a i 


|G(e%”) |? Ban (w)dus (4.31) 


oo = — 
a 


Proof. Since G(z) is stable, the output y(t) is expressed as 


CoO 


y(t) = D0 g(iut - 4) 


i=0 
Hence, it follows that w, = E{y(t)} = 0 and that the covariance function of y is 
given by 


=> Y- alig(k) Aun (I + k — 1) (4.32) 


Since the right-hand side is a function of /, so is the left-hand side. Taking the sum 
of absolute values of the above equation, it follows from (4.28) and the stability of 
G(z) that 


M 

ear: 
Ms 
Mes 


(#)|- 1g(R)| « |Auu( + & — #)| 


l=—0oo 


ll 
a ~ 
ily ye 
ca 
as 
Ne 
iw) 
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where the sum with respect to / should be taken first. From this equation, we see 
that |A,,,(0)| < oo, implying that y is a second-order process. Also, we can take the 
Fourier transform of both sides of (4.32) to get 


Pyy(z) = se Ane? 


T 
is 
iM: 
3 


g(t) g(k) Auu(l + & — 0] 


(k)2*¥ SO 2 OF PAL (L+k-4) 


i=0 k=0 l=—oco 


= G(z)G(z") Guu(2) 


Tl 
Me 
a 
S 
L 
Is 
& 


Equation (4.30) is trivial. Finally, putting / = 0 in (4.24b) gives (4.31). 


Example 4.5. Consider a system G(z) = V1 —a?/(z — a), where the input is a 
Gaussian white noise e with mean zero and variance o?. We see that the output of 
the system is described by the first-order autoregressive (AR) model 


y(t + 1) = ay(t) + V1—- a’ e(t), t=0,1,--: (4.33) 


The output process is called a first-order AR process. We observe that the future 
y(t + 1) depends partly on the present y(t) and partly on the random noise e(t), so 
that y is a Markov process. Since the spectral density function of e is ®,.(w) = 07, 
it follows from (4.32) and (4.30) that the covariance function and the spectral density 


function of the output process y are respectively given by 
Ayy (1) =o7ql!l, 1=0,+1,-:- 


and 
o?(1— a?) 


ee a < 
14+ a? —2acosw’ PLS 


Pyy (w) 


The auto-covariance functions and spectral density functions for a = 0.4,0.8 and 
o” = 1 are displayed in Figures 4.2 and 4.3. For larger a, the value of the covariance 
function decreases slowly as |I| gets larger, and the power is concentrated in the low 
frequency range. But, for smaller a, we see that the covariance function decreases 
rapidly and the power is distributed over the wide frequency range. 


The next example is concerned with an autoregressive moving average (ARMA) 
model. 


Example 4.6. Let e be a zero mean white noise with variance a”. Suppose that the 
system is described by a difference equation 


y(t) + ¥° aiy(t — 1) = e(t) + So ce(t 1) (4.34) 
i=1 l=1 
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Lag l 


Figure 4.2. Auto-covariance functions 


Pyy(w) 


Frequency w 


Figure 4.3. Spectral density functions 


This equation is called an ARMA model of order (p, q). From (4.34), we have 


where A(z) and C'(z) are given by 
A(z) =1l+ayz7' +++: +apz? 
C(z)=ltaz tte +eg274 


The process y generated by the ARMA model is called an ARMA process. 

The condition that y is stationary is that all the zeros of A(z) are within the unit 
circle (|z| < 1). Since the invertibility of the ARMA model requires that all the zeros 
of C'(z) are located within the unit circle, so that both 


oe C(z) 1 a A(z) 
A(z)’ H(z) C(z) 


are stable’. Since both H(z) and 1/H(z) are stable, we can generate the noise pro- 
cess e by feeding the output y to the inverse filter 1/H(z) as shown in Figure 4.4. 
Thus 1/ H(z) is called an whitening filter. Also, by feeding the white noise to the 


'We say that a transfer function with this property is of minimal phase. 
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filter H(z), we have the output y. Hence, the spectral density function of y is given 
by 
: z 2 
1 con han reer —jw4q 
,,(w) =o? Spa OME sale IO 
1+ ayeJ% + +++ + ape IeP 
We see that the output spectral has a certain distribution over the range (—7, 7) 
corresponding to the filter used. Therefore, H(z) is often called a shaping filter. 
It should be noted that the design of a shaping filter is closely related to the 
spectral factorization and the stochastic realization problem to be discussed in later 
chapters. 


(4.35) 


Figure 4.4. Whitening filter and shaping filter 


The rest of this section is devoted to the spectral analysis for an n-dimensional 


vector stochastic process {x(t), ¢ = 0, +1, ---}. For simplicity, we assume that x 
has zero mean. Then the covariance matrix is given by 
Age(l) = E{a(t+Da7™)}, 1=0,1,-:: (4.36) 


Obviously, we have A,.(1) = AZ,(—I). Let the diagonal elements of A,.(1) be 
Ay(l), i = 1, --- , n. Suppose that 


S~ |Au(l)| < oo, z=1,+--,n 


l=—oco 


hold. Then, we can define the spectral density matrix by means of the Fourier trans- 
form of the covariance matrix as 


O22 = Ae (4.37) 
l=—oco 
or es 
Py 2(w) = S- eit AsO, —TI<w<T (4.38) 


l=—oco 


where ©,,.(z) and ®,,(w) are n xX n matrices. In the matrix case, we have also 
Wiener-Khinchine formula 


1 
Ay = —— / G,.(z)z' dz (4.39a) 
209 S\z|=1 
1 Tv 
=— | e'6,,(w)dw, 1=0,+1,:--- (4.39b) 


4.5 Hilbert Space and Prediction Theory 87 


The (i, k)-elements of #,,(z) and ,,,(w) are respectively expressed as #;;,(z) 
and #;;(w), which are called the cross-spectral density function of x;(t) and x;(t). 
We see that &;;(w) are real functions with respect to the angular frequency w, but 
®;,(w), i # k are complex functions. 


Lemma 4.5. The spectral density matrix ®,.(w) has the following properties. 


(i) (Hermite) 6,.(w) = G(—w), —t<w<t 
(ii) (Nonnegativeness) ,.(w) > 0, —I<w<t 


Proof. Noting that A,,(1) = AT, (—1), we get 


Py4(w) = y 6 t'Asa(l) = s (la Ge met 


l=—co l=—oco 
l=—oco 
which proves (i). Now we prove (ii). Let the element of x(t) be 2;(¢), i= 1,---,n, 


and let €(t) = >>;"_, aja;(t) with a; € C. It is easy to see that € is a second-order 
stationary process with 


Age (1) = s ajay Aix (1) 
ik=1 


where A,, (1) = E{x;(t + lax(t)}. Taking the Fourier transform of Age (1) yields 


Pee (w) = S- Ai Gn Pin (w) (4.40) 
i,k=1 


From Lemma 4.2, we have ®¢¢ (w) > 0, so that the right-hand side of (4.40) becomes 
nonnegative for any a1, --- , Gn € C. This indicates that @,,(w) is nonnegative 
definite. 


4.5 Hilbert Space and Prediction Theory 


We consider a Hilbert space generated by a stationary stochastic process and a related 
problem of prediction. Let {y(t), # = 0, +1, ---} be a zero mean second-order 
stochastic process. Let the space generated by all finite linear combinations of y be 
given by 


ko 
i= {é = © axy(k) 


k=k, 


ner, —0o < ky < ky < 


Define £, 7 € H as 
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12 j2 
€= lay), n= do byyG) 
i=11 j= 


where i; <9 and 7, < jo. Then we define the inner product of € and 7 as (€, 7)a¢ = 
E{&n}. Hence we have 


(é, n)sc = E (So 0] S> bjy(3) 


i=11 JF=J1 
= S_ Efy(i)yG)}aibj = SY > Ayy(i- paid; (4.41) 
(i, j)ED (i, j)ED 


where D = {(i, j) | i1 <4 <t9; j1 <j < jo} isa finite set of indices. 
Now suppose that the covariance matrix {A,,,(i — j)} is positive definite. Then 


we can define 
Elle = = (€, )ac > Ayy(t — j)aia; 
(i, j)ED 


Then || - ||3¢ becomes a norm in J [106]. Hence the space H becomes a Hilbert space 
by completing it with respect to the norm || - ||4c. The Hilbert space so obtained is 
written as 

H =span{y(t)|—-oo <t< co} 


where Span denotes the closure of the vector space spanned by linear combinations 
of its elements. The Hilbert space generated by y is a subspace of the Hilbert space 
L2(2) of square integrable random variables. 


Example 4.7. Let {e(t), t = 0, 1, --- } be a white noise with zero mean and unit 
variance. The Hilbert space 


i = span{e(t) |t=0, 1, ---} 


generated by the white noise {e(t)} is defined as follows. For a = (a1,a2,---) € 
Iy[0, co), we define the set consisting of partial sums of e as 


KH = {6 a S- axe(k) S- lan? < 00, ap € a} 
k=0 


k=0 
Taking the limit m > n — oo yields 


m 


IIfm —Enll3c = So lax|? + 0 


k=n+1 


Thus {€,, } becomes a Cauchy sequence, so that there exists a quadratic mean limit 


Seta tas led Daeie) 
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Thus, by adjoining all possible quadratic mean limits, }{ becomes a Hilbert space. 
Hence { may be written as 


_ _ 4: 2 
a= {=o in, Soe 2 |e < OO, ner} 


The norm of € € H is given by ||€||5- = So p29 |ax|? < oo. In this sense, the Hilbert 
space is also written as H# = L2(2), where 2 is a set of stochastic parameters. 


For €, 7 € H, if (€, 7)3¢ = 0 holds, then we say that € and 7 are orthogonal, and 
the orthogonality is written as € 1 1. Let W be a subspace of the Hilbert space H. 
If (€, w)3¢ = 0 holds for any w € W, then € is orthogonal to W, which is written as 
€ 1 W, and the orthogonal complement is written as W+. 


Lemma 4.6. Let W be a closed subspace of a Hilbert space H. For any element 
€ € H, there exists a unique wo € W such that 


l& — wollse < || -—wllx, VWwew (4.42) 


Moreover, wo is a minimizing vector if and only if € — wo L W. The element wo 
satisfying (4.42) is the orthogonal projection of € onto the subspace W, so that we 
write wo = E{E | WH. 
Proof. See [111,183]. 


Let {y(t), ¢ =0, +1, --- } be a second-order stationary stochastic process with 
mean zero. We consider the problem of predicting the future y(t+m), m = 1, 2, --- 
in terms of a linear combination of the present and past y(t), y(t—1), --- in the least- 
squares sense. To this end, we define a Hilbert subspace generated by the present and 


past y(t), y(t—1), «++ as 


S- |ax| < 00, Gr cx 


k=0 


Y= {<0 = S- axy(t — k) 
k=0 


By definition, A(z) = )7y.5 ax2~* isa stable filter. Thus Y; is a linear space gener- 
ated by the outputs of stable LTI systems subjected to the inputs y(7), 7 < t, so that 
it is a subspace of the Hilbert space H = span{y(t) | —co < t < oo}. In fact, for 
&,7 © Ys, we have 


€= Do ay(t—-k), n= >_bey(t-k) 
k=0 k=0 


where S779 |ar| < 00, S2R 6 |be| < 00. It follows that 


Co 


E+n =o (ax + be)y(t— k) 


k=0 
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Since |ax + br| < |ax| + |bz|, we see that 77° 5 |ax + be| < oo. Hence, it follows 
that € + 7 € Y, holds. Moreover, for any a € R, we have a€ € Y;, implying that Y, 
is a linear space. 

Since, in general, y(t +m) € KH, m > 0 does not belong to Y;, the linear 
prediction problem is reduced to a problem of finding the nearest element (¢ + m) 
in Y, to y(t + m). It therefore follows from Lemma 4.6 that the optimal predictor is 
given by the orthogonal projection 


g(t +m) = E{y(t+m) | Yi} 
Define the variance of the prediction error by 
Om = E{ly(t+m) —gt+m)]"},  m=1,2,--- 


Then, we see that the variance is independent of time ¢ due to the stationarity of 
y. Also, since ¥, C Yi, s < t, the variance o. is a non-decreasing function with 
respect to m, L.e., 

O<of<o3<-- 


Definition 4.2. Consider the linear prediction problem for a second-order station- 
ary stochastic process y with mean zero. If o7 > 0, we say that y is regular, or 
non-deterministic. On the other hand, if 0? = 0, then y is called singular, or deter- 


ministic. 


If o7 > 0, we have 2, > 0 forall m = 1, 2, ---. Also, if 0? = 0, then it follows 
that 


jt +1) = E{y(t+1) |W} =yt+ Dew 


holds for any y(t + 1). Thus, Yz41 = Yz holds for t, so that Y, is equal for all ¢. 
Hence, we get 


2 2 2 
O0O=0, =05 =:::=90;, 


Therefore, the variances o2, of prediction errors are either positive, or zero. For the 
latter case, y is completely predictable by means of its past values. 
The following theorem is stated without proof. 


Theorem 4.3. (Wold decomposition theorem) Let y be a second-order stationary 
Stochastic process with mean zero. Then, y is uniquely decomposed as 


y(t) = u(t) + v(t) (4.43) 


where u and v have the following properties. 


(i) The processes u and v are mutually uncorrelated. 


(ii) The process u has a moving average (MA) representation 


Co 


u(t) = 5_ A(é)e(t — 4) (4.44) 


i=0 
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where € is a white noise, and is uncorrelated with v. Further, {h(1)} satisfy 


Co 


S [AP < co, (0) = 1 (4.45) 
i=0 
(iii) Define U, = span{u(t), u(t — 1), ---} and €, = span{e(t), e(t — 1), ---}. 
Then we have U; = €4 and the process u is regular. 
(iv) Define V, = Span{v(t), v(t — 1), ---}. Then Vz = V, holds for all t, s, so that 
the process v is called singular in the sense that it can be completely determined 


by linear functions of its past values. 


Proof. For proofs, see Anderson [13], Koopmans [95], and Doob [44] (pp. 159-164 
and pp. 569-577). 


Example 4.8. (Singular process) From Theorem 4.3 (iv), it follows that Vii; = Vz 
for all t, 1. Hence v(t + 1) € Vi4, is also in V;. Thus E{v(t +1) | Ve} = v(t +0), 
implying that if v(s), s < t are observed for some ¢, the future v(f+ 1),u(t+2),--- 
are determined as linear functions of past observed values, like a sinusoid. Thus such 
a process is called deterministic. 


Theorem 4.4. Let y be a zero mean, stationary process with the spectral density 
function ®y,(w). Then, y is regular if and only if 


1 Tv 
co = ~ | log Byy(w)dw > —o0 (4.46) 


This is called the regularity condition due to Szegé [65]. Under the regularity condi- 
tion, there exists a unique sequence {h(i), i = 0,1,--+-} such that (4.45) holds, and 
and the transfer function 


H(z)=S hz", (0) = 1 (4.47) 


has no zeros in |z| > 1, and provides a spectral factorization of the form 
Py, (z) = 0° H(z) H(z") (4.48) 


where the spectral factor H(z) is analytic outside the unit circle (|z| > 1), satisfying 
(4.45) and o? = e®. 
Proof. For a complete proof, see Doob [44] (pp. 159-164 and pp. 569-577). But, 
we follow [134, 178] to prove (4.47) and (4.48) under a stronger assumption that 
log &,,(z) is analytic in an annulus p < |z| < 1/p withO <p <1. 

Under this assumption, log &,,,(z) has a Laurent expansion 


log byy(z) = S- az, p<|z|<1/p (4.49) 


l=—oo 


92 4 Stochastic Processes 


By using the inversion formula [see (4.24)], we have 
1 /[* . 
c= ~ | log Byy(w)dw,  1=0,+1,-- (4.50) 
T Jain 


For / = 0, we have the equality in (4.46). Since c are the Fourier coefficients of an 
even, real-valued function log $,, (w), they satisfy c_; = ¢,/ = 0,+1,---. Thus, 
for p < |z| < 1/p, 


Co 


Pyy(z) = exp{ S- az} =e exp{ a2'} exp{ az} 
l=1 l=1 


l=—co 


Now we define ~ 
= fae (4.51) 


Since the power series in the bracket {---} of (4.51) converges in |z| > p, we see 
that H(z) is analytic in |z| > p, and H(oo) = 1. Thus, H(z) of (4.51) has a Taylor 


series expansion 
HA=SA@z ‘lalop 
i=0 


with h(0) = 1. This shows that (4.47) and (4.48) hold. This power series converges 
in |z| > p, so that H(z) has no poles in |z| > 1. Also, it follows that |h(1)| < Mp} 
for any / > 0, where M@ > 0 and0 < p; < p < 1. Hence, 


So aD <0 + So |ADP <x 
l=0 


1=0 


Moreover, from (4.51), we see that 


[H(z)]~ 1 = exp{- dae } 


is analytic in |z| > p, and hence H(z) has no zeros in |z| > 1. This completes the 
proof that H(z) is of minimal phase. 


Since &,,(w) > 0, it follows that log B,,(w) < &,,(w) holds. Thus we get 


1 wT 
co < xf. Byy(w)dw < oo 


This implies that co is always bounded above. Thus, if co is bounded below, the 
process is regular; on the other hand, if co = —oo, we have ao? = 0, so that the 
process becomes singular (or deterministic). 

It should be noted that under the assumption that co > —oo of (4.46), there is 
a possibility that ®,,,(z) has zeros on the unit circle, and hence the assumption of 
(4.46) is weaker than the analyticity of log &,,(z) in the neighborhood of the unit 
circle |z| = 1, as shown in the following example. 
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Example 4.9. Consider a simple MA process 
y(t) = e(t) —e(t— 1) 
where € is a zero mean white noise process with variance a. Thus, we have 
H(z)=1-2z7' => y,(z)=2-(z+27'), Byy(w) =2—2cosw 


It is easy to see that log &,,(z)|,=1 = —oo, so that log &,,,(z) is not analytic in the 
neighborhood of |z| = 1. But, we can show that (see Problem 4.7) 


i log byy(w)dw = ‘] log(2 — 2cosw)dw =0 > —o0 (4.52) 


Thus, the condition of (4.46) is satisfied. But, in this case, it is impossible to have the 
inverse representation such that 


e(t) = eric Se < 00 
i=0 i=0 


In fact, the inverse 1/H(z) shows that a; = 1,1 = 0,1,---; but the sequence a = 
(1,1,--+) is not square summable. 


Example 4.10. Consider a regular stationary process y. It follows from Theorems 
4.3 and 4.4 that y can be expressed as 


yt) = So A@et-1), H(z) = DOW 
i=0 i=0 


where we assume that H(z) is of minimal phase. For m > 0, we consider the m-step 
prediction problem of the stationary process y. Since Y; = €;, the m-step predictor 
is expressed as the following two different expressions: 


H(t +m | t) = Bf{y(t+m) |W} =d_ giylt-1) (4.53) 
1=0 

= B{y(t+m) | &} = >_ fie(t-1) (4.54) 
1=0 


In terms of coefficients {g;} and {f;}, define the transfer functions 


We see that feeding y into the inverse filter 1/H(z) yields the innovation process 
€ as shown in Figure 4.5. Thus, by using the filter F(z), the optimal filter G(z) is 
expressed as 
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7] 


Hence, it suffices to obtain the optimal filter F(z) acting on the innovation process 
We derive the optimal transfer function F(z). Define the prediction error 
g(t + m) = y(t +m) - g(t+m|t) 


Then, by using (4.54), the prediction error is expressed as 


g(t +m) = S~ hide (t+m—i) - D4 e(t — 4) 
i=0 
; - f@|e(t—-8) 


h(i)e(t +m —i) + 5 [ali +m) 


i=0 


i=0 
Since € is a white noise, the variance of g(t + m) is written as 
m—1 fore) 
E{y?(t+m)} = 02 So h(i) +02 S [hi + m) - F(A)? (4.55) 
i=-0 i=0 

Hence, the coefficients of the filter minimizing the variance of estimation error are 
given by 

fi) =hG+m), 1-0; liek (4.56) 

This indicates that the optimal predictor has the form 
H(tt+m|t)= So h(mtie(t-4) =e h(t +m — i)e(i) 


i=0 
We compute the transfer function of the optimal predictor. It follows from (4.56) 


that n 
—* = h(m) + h(m+4 lz 


= S- htitm)z = 
i=0 


Multiplying H(z) by z™ yields 
z™ A(z) = h(O)z™ +--+ h(m 
We see that F(z) is equal to the causal part of 2” H(z); the causal part is obtained 


i 
by deleting the polynomial part. Let | - ]_ be the operation to retrieve the causal part 
[z™ H(z)|+, so that the optimal transfer function is given by 


—1)z+h(m) + h(m+ Lz 


Then, we have F(z) = 


4.6 Stochastic Linear Systems 95 


Fe) _ e™H(2)l4 


G(z) = (4.57) 


From (4.55), the associated minimal variance of prediction error is given by 


0 
Since y(t +m) = (+m | t) + g(t + m) with G+ m | t) L ¥t+m), 


ee?) 2 
Om Op 0g 


where 05 = E{j?(t-+m | t)}. Noting that y = H(z)e and gj = F(z)e, the minimum 
variance is expressed as 


2 = % [ (\n1(e™)P — [e(e) Pao 


oo, == 
an?) cae ae 


by using the formula (4.31). 


4.6 Stochastic Linear Systems 


We consider a stochastic linear system described by the state space model 


a(t +1) = A(t)x(t) + w(t) (4.58a) 
y(t) = C(t)x(t) + v(0), t=0,1,--°: (4.58b) 


where « € IR” is the state vector, y € IR? the observation vector, w € IR” the 
plant noise vector, and v € R? the observation noise vector. Also, A(t) € R"*”, 
C(t) € RX” are deterministic functions of time ¢. Moreover, w and v are zero 
mean Gaussian white noise vectors with covariance matrices 


e{[ai] ier “ei}=[880]6 am 


where Q(t) € R”*” is nonnegative definite, and R(t) € R?*? is positive definite 
for allt = 0, 1, ---. The initial state (0) is Gaussian with mean F’{x(0)} = 2 (0) 
and covariance matrix 


E{[x(0) — u2(0)][z(0) — 2 (0)]"} = I7(0) 


and is uncorrelated with the noises w(t), v(t), t = 0, 1, ---. The system described 
by (4.58) is schematically shown in Figure 4.6. This model is also called a Markov 
model for the process y. 

In order to study the statistical properties of the state vector x(t) of (4.58), we 
define the state transition matrix 
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Figure 4.6. Stochastic linear state space system 


A(t — 1)A(t — 2)--- A(s), t>s 
P(t,s) = ( ae ) (8) (4.60) 
T, t=s 
For any k < s < t, it follows that 
P(t, k) = B(t, s)B(s,k) (4.61) 
In terms of the transition matrix, the solution of (4.58a) is written as 
t-1 
a(t) = O(t,s)a(s) + 5> &(t, k + 1)w(k) (4.62) 
k=s 
Then, we can easily prove the lemma that characterizes the process x(t). 
Lemma 4.7. The process x of (4.58a) is a Gauss-Markov process. 
Proof. Putting s = 0 in (4.62), 
t-1 
a(t) = 6(t,0)x(0) + 5 O(t,& + 1)w(k) (4.63) 
k=0 


This shows that a(t) is a linear combination of a Gaussian random vector «(0) and 
the noises {w(0), --- , w(¢ — 1)}, so that x(t) is a Gaussian random vector. Thus x 
is a Gaussian process. Suppose that s < ¢. Then, we see from (4.62) that x(t) is also 
a linear combination of 2(s), w(s), --- , w(t — 1), and that {w(s), --- , w(t — 1)} 
are Gaussian white noises independent of «(s). Hence, we have 


(a(t) | x(s), e(s— 1), +++, #(0)) =p) |a(s)),  t>s 


This implies that {a(t), t = 0, 1, --- } is a Markov process. 


It should be noted that a Gaussian process can be characterized by the mean and 
covariance matrix 


He (t) = E{x(t)}, Ava (t,s) = Ef[2(t) — me(t)][x(s) — He(s)]"} 
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Lemma 4.8. The mean vector and the covariance matrix of the state process x of 
(4.58a) are respectively given by 


x(t) = &(t, 0) ps (0) (4.64) 
and 


Agctta ee t>s (4.65) 


IT(t)@"(s,t), t<s 


where IT(t) := Axx (t,t) = cov{x(t) — uz (t)} is the state covariance matrix that 
satisfies 


IT(t) = ®(t,0)17(0)6"(t,0) + 3 B(t,k + 1)Q(k) GO" (t,k +1) (4.66) 
k=0 


Proof. Taking the expectation of both sides of (4.63) immediately yields (4.64). We 
prove (4.65). Suppose that t > s. Then it follows from (4.63) that 


Aza(t,8) = Eq [®(t,0)[2(0) — pn (0)] + So a(t, 1+ Lw(0)] 
1=0 


x [®(s, 0)f0(0) — j10(0)] + D> ®(s, + w(®)] } 


k=0 
Expanding the right-hand side of the above equation and using (4.59) yield 


Aga(t,s) = ®(t,0)7(0)6"(s,0) + S O(t, k + 1)Q(k)@" (s,k +1) 
k=0 


Putting s = ¢ gives (4.66). Since #(¢,0) = A(t, s)6(s,0), we see from the above 
equation that (4.65) holds for ¢ > s. Similarly, we can prove (4.65) for t < s. 


It can be shown that the state covariance matrix I7(t) satisfies 
M(t+)=AHMHAH+OH, t=0,1,--: (4.67) 


Thus, for a given initial condition J7(0) = cov{x(0)}, we can recursively compute 
the covariance matrix /7(¢) fort = 1,2,---. 


Lemma 4.9. The process y defined by (4.58) is a Gaussian process, whose mean 
vector [ty(t) and covariance matrix Ay,(t, 8) are respectively given by 


fy (t) = C(t) ba (t) = CH) P(t, 0) Ha (0) (4.68) 


and 
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C(t)®(t, s)IT(s)CT(s) + C(#)(t,s + 1)S(s), t>s 


Ayy(t,s) = § CITC? (t) + RCE), pg AS) 
Ay (854), t<s 


Proof. Equation (4.68) is obvious from (4.58b) and (4.64). To prove (4.69), we 
assume that t > s. It follows from (4.58b), (4.62) and (4.68) that 


y(t) — My (t) = C(t) [at) — He (t)] + v(t) 


t-1 


= C(t) H(t, s)[x(s) — pe (s)] + C(O) S- P(t, k + 1)w(k) + v(t) 


k=s 


Thus we have 


Ayy(t,8) = E{[y(t) — py (t)][y(s) — Hy (s)]"} 
= | [e@et, s)#(s) + C(t) y O(t, k + Lw(k) + v(t) 
k=s 
x [C(s)a(s) + v(s)| g 


where &(t) := a(t) 
and Ef{u(k)#"(s)} 
Similarly, fort < s 


z(t). From (4.59) and the fact that E{w(k)z'(s)} = 0 
0, k > s, we have the first and second equations of (4.69). 


4.7 Stochastic Linear Time-Invariant Systems 


In this section, we consider a stochastic LTI system, where A(t), C(t), Q(t), R(t), 
S(t) in (4.58) and (4.59) are independent of time t. 
Consider a stochastic LTI system described by 


a(t +1) = Aa(t) + w(t) (4.70a) 
y(t) = Ca(t) + v(t), t=to,to+1,--: (4.70b) 


where to is the initial time, and 2(to) is a Gaussian random vector with mean j1, (to) 
and the covariance matrix IT(to). 

We see from (4.60) that the state transition matrix becomes @(t,s) = A’~*, t > 
s. It thus follows from (4.64) and (4.66) that the mean vector is given by 


a(t) = A** jo (to) (4.71) 


and the state covariance matrix becomes 
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t-1 
I(t) = ATE) (Ale ae se AOA 
k=to 
t—to—-1 
= Ab" IT(ty)(AT)% + S$” APQ(AT) (4.72) 
k=0 
Also, from (4.72) and (4.67), I7(t) satisfies 
I(t +1) = AM(t)AT +Q, (= totes es (4.73) 


Lemma 4.10. Suppose that A in (4.70a) is stable, i.e., p(A) < 1. Letting tp 7 
—oo, the process x becomes a stationary Gauss-Markov process with mean zero and 


covariance matrix 
A'II 1>0 
Aye (1) = : = 4.74 
0 ae [<0 
where IT is a unique solution of the Lyapunov equation 
II = AITA'+Q (4.75) 


Proof. Since A is stable, we get : lim A‘t-to = 0. Thus, from (4.71), 
0—-cOo 


lim juz (t) = 0 


tg—>—0oo 
Also taking t9 + —oo in (4.72), 


lim I(t) = 57 AkQ(atyh =2ii 
k=0 


tg——0o 


It can be shown that JT satisfies (4.75), whose uniqueness is proved in Theorem 3.3. 
Since the right-hand side of (4.74) is a function of the time difference, the process x 
is stationary. The Gauss-Markov property of z follows from Lemma 4.7. 


Lemma 4.11. Suppose that A is stable. For tg —> —oo, the process y of (4.70b) 
becomes a stationary Gaussian process with mean zero and covariance matrix 
CA'1CT, 1>0 
Ay, () = « CHCT +R, 1=0 (4.76) 
C{A'y 16", [<0 
where C'" is defined by 
CT = All0™ +S (4.77) 


Proof. As in Lemma 4.10, it can easily be shown that since A is stable, for tg > 
—oo, y of (4.70b) becomes a stationary Gaussian process with mean zero. From 
Lemmas 4.9 and 4.10, the covariance matrix of y becomes 
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CA TIO’ + CAS, I>0 
Ayy(l) = CIICT + R, 1=0 
Cm(A™T)-"cT + §T(AT)--1CT, 1 <0 


This reduces to (4.76) by using C of (4.77). 


Example 4.11. For the Markov model (4.70), the matrices A, C, C are expressed as 


A= E{a(t+1)2"*(t)} 0 
C = Ely(t)eT()} 
C = Fly(t)x"(t+1)} 
In fact, post-multiplying (4.70a) by a(t), and noting that E{w(t)x"(t)} = 0, we 


have 
E{a(t +12? (t)} = AE{x(t)x"(t)} = AT 


showing that the first relation holds. The second relation is proved similarly by using 
(4.70b). Finally, from (4.70), 


E{y(t)a" (t+ 1)} = B{[Ca(t) + v(t)][a"()A™ + w* ()]} 
SCA +25) =C 


This completes the proof. 


Example 4.12. We compute the spectral density matrix &,, (z) of y with covariance 
matrix (4.76). We assume for simplicity that S = 0, so that CT = ATIC’. Thus, we 


have 
CALANCT, 140 
Ayy (1) = T 
CAg2(0)C™ +R, 1=0 


so that the spectral density matrix is given by 
Byy(z) = CBzq(z)C' +R (4.78) 


where ©,,,(z) is the spectral density matrix of x. It follows from (4.37) and (4.74) 
that 


foe) -1 oe) 
Peo(z)= SY) Ave(l)zt= So MAT) 21 +0450 Ae 


l=—oco l=—oco l=1 
=T+ (>: Hay a (>: at) II (4.79) 
I=1 I=1 


Let p := p(A). Since A is stable, we have 0 < p < 1, so that 
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foe) 
Yoztd=(2I- A) 1A, [2] > 
l=1 


and 


z'( AT)! Al (ge tp = AY) |2| aa 


Js 


I 
un 


This shows that the right-hand side of (4.79) is absolutely convergent for p < |z| < 
p—'. Thus, by using the Lyapunov equation (4.75), 


@,,(z) = I + WA™(z-1I — At)! + (21 — A) A 
=(@T=A)71UT=AlA)@ 1a") 
= (zl — A) 'Q(271T- Aly 


Also, let W(z) = C(zI — A)~!. Then 4,,(z) is expressed as 
y,(z) = R+ W(z)QW" (z7") (4.80) 


This is an extended version of (4.29) to a multivariable LTI system. If S 4 0, (4.80) 
becomes 


Py, (z) =R+W(z)S + S'W1 (271) + W(z)QW7 (271) (4.81) 


For a proof of (4.81), see Problem 4.12. 


4.8 Backward Markov Models 


In the previous section, we have shown that the stochastic LTI system defined by 
(4.70) generates a stationary process y, so that the system of (4.70) is often called 
a Markov model for the stationary process y. In this section, we introduce a dual 
Markov model for the stationary process y; the dual model is also called a backward 
Markov model corresponding to the forward Markov model. 

For the Markov model of (4.70), we assume that the state covariance matrix 
IT = E{a(t)x" (t)} of (4.75) is positive definite. Then, we define 


w(t) := W~*2(t) — ATO a(t + 1) (4.82) 
It follows from (4.74) that 
E{w,(t)e? (t+ 0} = 01 E{e(t)et (t+ D}— ATO Ef{ae(t+ aol (t+ 1} 
= At, (l) — ATH Az, - 1) 
Sit (Aly Ala ia Sy. S138 3s 


Hence, w»(t) defined above is orthogonal to the future x(¢ + 1), 1 = 1, 2, ---, so 
that it is a backward white noise. In fact, by definition, since 
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w(t +1) € span{a(t +1), a(t +1+1)} 
we see that wy(t) is orthogonal to wy(t + 1). This implies that 
E{wy(t + lw) (t)} = 0, 140 


Motivated by the above observation, we prove a lemma that gives a backward 
Markov model. 


Lemma 4.12. Define x(t) = a(t +1) with IT = I~". Then, the model with x» 
as the state vector 


ap(t — 1) = At ay(t) + ws(t) (4.83a) 


y(t) = Cap (t) + vs (t) (4.83b) 


is a backward Markov model for the stationary process y, where C_ € R?*” is called 
the backward output matrix, and wy, and vp are zero mean white noises with covari- 
ance matrices 


Bf es [wi (s) HO} = | a | i (4.84) 
Moreover, we have cov{x»(t)} = II and 
O= (=A HA, SSC =A. RaAGO.=CHC*: 74.85) 


Proof. Equation (4.83a) is immediate from (4.82). We show that the following 
relations hold. 


Ef{wy(t)x 
Efug(t)x 


(¢+1-1)}=0, FEf{we(t)y™(t+)}=0, 1=1,2,--- (4.86) 
(¢+1-1)}=0, Efus(tyt(t+)}=0, 1=1,2,--- (4.87) 


(i) The first relation of (4.86) follows from the fact that w(t) L «(¢+1), l= 
1, 2,--- and a(t +1) = Hay(t +1 — 1). We show the second relation in (4.86). 
From (4.82), 


E{wy(t)y? (t+ D} = 0 Bf2(t)[Ca(t +1) + v(t + D]T} 
— ATI Efa(t + 1)[Ca(t +1) + 0(t + D]"} 
= MT E{e(t)2'(t+0}CT + W-E{a(t)vt (t+ )} 
—~ATH'E{e jal (t+ 2}C7 
t+1)u'(t+D} 


(¢+1 
— A™ IT" E{x( 
Since v(¢+ 1) L {a(t + 1), x(t)}, 1 =1, 2, ---, the second and the fourth terms in 
the above equation vanish. Thus, it follows from (4.74) that for / = 1, 2, ---, 
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E{ws(t)y? (¢ +} = 0-1 (At) CT — AT (A?) C7 =0 


as was to be proved. 
(ii) From (4.83b) and x(t — 1) = 7~'2x(t), 


Efvs(t)ag (t +1 -1)} = Ef{fy(t) — Cao()]a* (t+) } 7 
= E{y(t)x*(t +0} 
~ CI 'Ef{a(t+1)e*(¢+)}0 


The first term in the right-hand side of the above equation is reduced to 
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(4.88) 


Efy(tjal(¢+)}07! = E{y()[Aet +1—1) + u(t +1—1)]"} 07+ 


= E{y(t)e"(t +1-—1)} ATO“ 
+ Efy(tw' (t+1—-1)} oa 
= Efy(t)a'(t+1-1)}At oO 


where we have used the fact that E{y(t)w1 (t+1—1)} =0, 1 = 2, 3, ---. Repeating 


this procedure gives 
Efy(t)a'(t+)}07' = Efy(t)e' (t+1—2)}(AT)P? oO 
= Efy(t)x" (t+ 1)}(At) 
= Cat a 


Also, from (4.74), the second term of the right-hand side of (4.88) becomes 


CH E{a(t + l)a™(t-+)HI-* = CHAT 
= CAA 


(4.89) 


Thus, it follows that the right-hand side of (4.88) vanishes, implying that the first 
equation in (4.87) is proved. Similarly, we can prove the second equation. In fact, we 


see from (4.83b) and (4.70b) that 
Efur(thy" (t+ )} = Effy(t) — Cav®ly* E+ )} 
= Efy(t)y"(t+D} 
— CI E{a(t+ [Cat +1) + v(t +)]"} 


= E{y(t)y" (¢+)} — C1 Ff{a(t + 1)27(t +1)}C7 


By using (4.74) and (4.76), we see that the right-hand side of the above equation 


vanishes for / = 1,2,---. 


Having proved (4.86) and (4.87), we can easily show that wy, v, are white noises. 


By using (4.87), 
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E{u,(t)ug (¢ + D} = Ef{w(t)[y¢+) — Cay(t +D]'} =0, 1=1,2,--- 


so that vp (t) is a white noise. Similarly, we see from (4.86) and (4.87) that for / = 
ie oe 


E{up (t)wp (E+ 
P(t+ 


1)} = E{vy(t)[xo(t +1 —1) — AT ay (t +. D]7} = 0 
E{wy(t)v, (t +1) 


l)} = E{ws(t)[y(t +1) — Cae(t + D]"} = 0 


Hence, wp(t +1) 1 vp(t) holds for 7 # 0. Thus we have shown that (ws(t), vs(¢)) 
are jointly white noises. Finally, it can easily be shown that cov{z,(t)} = I and 
(4.85) hold. 


The backward Markov models introduced above, together with forward Markov 
models, play important roles in the analysis and modeling of stationary stochastic 
processes. Especially, the backward Markov model is utilized for proving the positive 
real lemma in Chapter 7, and it is also instrumental for deriving a balanced (reduced) 
stochastic realization for a stationary stochastic process in Chapter 8. 


4.9 Notes and References 


e A large number of books and papers are available for stochastic processes and 
systems. Sections 4.1 and 4.2, dealing with an introduction to stochastic pro- 
cesses, are based on [13, 44, 68, 134, 142], where the last one contains many 
practical examples. 


e Section 4.3 has discussed ergodic properties of stochastic processes based on 
[123]. Also, for spectral analysis of stationary stochastic processes in Section 
4.4, see [123, 134, 150]. 

e In Section 4.5, we have introduced Hilbert spaces generated by stochastic pro- 
cesses, and then stated the Wold decomposition theorem; see [44,95]. This the- 
orem is needed in developing a stochastic realization theory in the presence of 
exogenous inputs in Chapter 9. The regularity condition of stationary processes 
due to Szego [65] is proved under a certain restricted condition [134, 178], and 
the linear prediction theory is briefly discussed. Some advanced results on pre- 


diction theory are found in [115,179,180]. Other related references in this section 
are books [13, 33, 138]. 

e Sections 4.6 and 4.7 have dealt with the stochastic linear dynamical systems, 
or the Markov models, based on [11, 68, 144]. Moreover, Section 4.8 derives 


dual or backward Markov models for stationary stochastic processes based on 
(39, 42, 106]. 
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4.10 Problems 


4.1 Prove the following identity for double sums. 


N N N-1 
Yd oG-DN= SS (N= |kI) oe) 
i=1 j=1 k=—N41 
4.2 Prove (4.13). 
4.3 Prove the formulas for Césaro sums. 
(a) li =0 li ly ;=0 
; Dee ee 


Le he k 
b lim — =0 => lim - da = 
(b) Tim — Slax dim => ( 7) a 


k=1 k=1 


OF a ees 
(c) Jim Yo an = 8 > jim) (1-£) a =5 


k=1 k=1 
4.4 Prove Theorem 4.2. 
4.5 Prove Lemma 4.3 (ii) by means of the relation (w) = Nim, In (w), where 
Ty (w) is given by 
1 23% ‘ 
In(w) = INTE pz; e “'z(1) >0 


4.6 For the linear system shown in Figure 4.1, show that the following relation 
Pyy(w) = Ge) Pun (w) 


holds, where the cross-spectral density function &,,,(w) is the Fourier transform 
of A,.,.(1). (Hint: See Lemma 4.4.) 

4.7 Prove (4.52). 

4.8 Suppose that the spectral density function of an ARMA process y is given by 


1.25 + cosw 
®& = 
yw) 1.81 — 1.8cosw 


Obtain the difference equation satisfied by y. 


4.9 By using the result of Example 4.10, solve the m-step prediction problem for 
the ARMA process 


y(t) + ay(t — 1) = e(t) + ce(t — 1), la] <1, lel <1 


where m > Oanda#c. 
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4.10 Show that y in (4.58) is not a Markov process, but the joint process (x, y) is a 
Markov process. 


4.11 Let I(t), ¢ = 1, 2, --- be the solution of the Lyapunov equation (4.73) with 
the initial condition J7(0) = 0. Let Mp := A, No := Q. Fork = 1, 2,---, we 
iterate 

Nu = Mp-1Ne—-1Mj_y + Ne-1 
My := Mi_y 


Show that J7(2") = Ny, k = 1, 2, --+ holds. This scheme is called a doubling 
algorithm for solving a stationary Lyapunov equation. 


4.12 Prove (4.81). 


5 


Kalman Filter 


This chapter is concerned with the discrete-time Kalman filter for stochastic dynamic 
systems. First, we review a multi-dimensional Gaussian probability density function, 
and the least-squares (or minimum variance) estimation problem. We then derive 
the Kalman filter algorithm for discrete-time stochastic linear systems by using the 
orthogonal projection. The filter algorithm is extended so that it can be applied to 
stochastic systems with exogenous inputs. Moreover, we derive stationary forward 
and backward Kalman filter algorithms, which are useful for the study of stochastic 
realization problem. 


5.1 Multivariate Gaussian Distribution 


We consider a multivariate Gaussian distribution and the minimum variance estima- 
tion problem. Let  € IR” and y € R’ be jointly Gaussian random vectors. Let the 
mean vectors be given by uw. = E{x} and uw, = E{y} and the covariance matrices 
by 

Fite! fee aa iz cov{x,z} cov{az,y} 

Lyx Lyy cov{y,z} cov{y,y} 

where we assume that the covariance matrix 3 € R(°+?)("+P) is positive definite. 
For convenience, we define a quadratic form 


CT NIC ee ean ee bea k ial 6.1) 


Then the joint probability density function of (x, y) can be written as 
(0, y) = Zex{-5O(2.y)} (5.2) 
PX, Y) = C exp 2 L,Y) (> . 


where C' = (27)("+¥)/2|5|1/2 is the normalization factor. 
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Lemma 5.1. Let the probability density function p(x, y) be given by (5.2). Then, the 
conditional distribution of x given y is also Gaussian with mean 


E{z | y} = He + Ley Dey (y — Hy) (63) 
and covariance matrix 
Bile — Efe | y¥Jle — Efe | y}J"} = Sez —LeyDyh Dye A) 


Moreover, the vector x — E{x | y} is independent of y, i.e., the orthogonality condi- 
tions — E{ax | y} L y holds. 


Proof. First we compute the joint probability density function p(x, y). Define 


yal LS ee Pal — is me 


Also, define Y := ¥)24 — May Pasay Sy. Then, it follows from Problem 2.3 (c) that 


Vga te Vig alo aia ai Vig ae ee 


yy? 


Ss 1 1 1 1 
Vga, Per yy, 


Thus Q(x, y) defined by (5.1) becomes 
Q(z, y) = (x#- ig) Veale — He) + (x - Be)” Vay ly — Hy) 
ty = fy) Vya(2 — He) + (y — jig) Via — by) 
= [x — pe + Vig’ Vey ly = reo as ie — be + Vig Veyly — Hy)] 
ty = ay Wow aa VyeVag Val y — Hy) 
= (w@ — a)*Y7*(@ — a) + (y — By) yy (y — by) 
where @ := [ly + Seyeig (y — 4, ). Therefore, the joint probability density function 
p(x, y) is given by 
1 1 Ty-1 
P(x, y) = Gp exP ) —g(@ — a) T(x — a) 
1 1 Ty-l 
x Gu &XP =u — Py) Lay YY — By) (5.5) 


where C! = (2r)"/?|Y|!/2 and C" = (2r)?/?|3,,,|!/?. Thus integrating p(x, y) 
with respect to x yields the marginal probability density function 


1 1 
PY) = Tr eXP {-30 = My)! Syy Y= 1) (5.6) 


It also follows from (5.5) and (5.6) that the conditional probability density function 
is given by 
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ple | y) = zrexp | -3(@- a)" 7"(e— a) } 


From this, (5.3) and (5.4) hold. Also, we see from (5.5) that x — a = x — E{x | y} 
and y are independent. 


In the above proof, it is assumed that »’ is nonsingular. For the case where » is 
singular, the results still hold if we replace the inverse 7—! by the pseudo-inverse 
31 introduced in Lemma 2.10. 


Lemma 5.2. Suppose that (x, y) are jointly Gaussian random vectors. Then, the 
minimum variance estimate of x based on y is given by the conditional mean 


22 BAe | y} Spe + Soy 2, By) (5.7) 


Proof. It may be noted that the minimum variance estimate @ is a y-measurable 
function f(y) that minimizes E{||a2 — f(y)||?}. It can be shown that 


E{\|x— f(y)|"} = E{lla-—a+a—f(y)|"} 
= E{||x — al|?} + 2E{[x — a]"[a — f(y)]}} 
+ E{lla— f(y)II"} 


Since a — f(y) is y-measurable and since E{x | y} = a, the second term in the 
right-hand side becomes 


E{[x — a]"[a — f(y))} = E{E{[x — a)" — fy)] | y}} 
= E{E{[x — a]* | y}[a — f(y)]} =0 
Thus we have 
E{\le — f(y)|?} = Efile — ol?} + E{lla — FYI? } = Ef{lla — all?} 


where the equality holds if and only if f(y) = a. Hence, the minimum variance 
estimate is given by the conditional mean a = E{z | y}. 


Suppose that x, y are jointly Gaussian random vectors. Then, from Lemma 5.2, 
the conditional expectation E{z | y} is a linear function in y, so that for Gaussian 
case, the minimum variance estimate is obtained by the orthogonal projection of x 
onto the linear space generated by y (see Section 5.2). 


Example 5.1. Consider a linear regression model given by 
y= Hxt+v 


where x € R” is the input Gaussian random vector with N(u,, P), y € R? the 
output vector, v € R? a Gaussian white noise vector with N(0, R), and H € R°*” 
a constant matrix. We compute the minimum variance estimate of « based on the 
observation y, together with the error covariance matrix. From the regression model, 
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Sey = E{(a — e)(y — by)" } = PHT 
Dyy = E{(y — ty) (y — ty)" } = HPH'+R 


Therefore, from Lemma 5.2, the minimum variance estimate is given by 
& =p, + PH"[HPH! + R)-1(y- Hye) 
Also, from (5.4), the error covariance matrix P := E{[x — #][x — @]"} is 


P = P— PH"™|HPH" + R)-"'HP (5.8) 


where H PH? + R is assumed to be nonsingular. 


Lemma 5.3. For P € R°“”, H € R°*", RE R°*?, we have 
PH"|R+HPH"|"'=[P-'+A'R ‘8 'A'R (5.9) 
where it is assumed that P and R are nonsingular. 
Proof. The following identity is immediate: 
[P-1+ H'R1H|PH' = H'R"|R+ HPH"| 


Pre-multiplying the above equation by [P~! + H™R7!H]! and post-multiplying 
by [R + HPH™}"' yield (5.9). 


It follows from (5.9) that the right-hand side of (5.8) becomes 
P—PH"|HPH'+R)'HP=[P'+H'R'Ay" (5.10) 
Equations (5.9) and (5.10) are usually called the matrix inversion lemmas. 


Lemma 5.4. Let (x,y,z) be jointly Gaussian random vectors. If y and z are mutu- 
ally uncorrelated, we have 


E{x|y, 2} = E{x|y}+ Bia | 2} — pe (5.11) 
Proof. Define w? := (y", 2") and wi, := (wy, #2). Then we have E{x | w} = 
E{zx | y, z}. Since y and z are uncorrelated, 


Su = ay ely 2 = | a 


Thus, from Lemma 5.1, 
E{zx | wh = pe + Siig i = play) 
= Ha + Lay Zyy (Y — Hy) + De Stas (2 fly) 


Since E{x | y} = fe + Loy Loy (y — My) and E{z | 2} = we + Le2D7' (2 — Me), 
we see that (5.11) holds. 
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We consider the minimum variance estimation problem by using the orthogonal 
projection in a Hilbert space of random vectors with finite variances. Let x € IR” be 
a random vector with the finite second-order moment 


Bille|l?} = > B{a?} < 00 


Let a set of random vectors with finite second-order moments be 
H = {| B{\lz|?} < oo} 


Then, it is easy to show that H is a linear space. 
For x, y € H, we define the inner product by 


(x, y)a¢ = E{xty} = traceE{ry"} 


and the norm by 
Ilzllsc = V (@, @)5¢ = VE {Ila ||?} 


By completing H by means of this norm, we have a Hilbert space of n-dimensional 
random vectors with finite variances, which is again written as KC = Lo(2). 

Let x, y € KH. If (x, y)3¢ = 0 holds, then we say that x and y are orthogonal, 
and write x L y. Suppose that Y is a subspace of KH. If (x, y)3¢ = 0 holds for any 
y € Y, then we say that x is orthogonal to Y, and write x L Y. Let x € H. Then, 
from Lemma 4.6, there exists a unique yo € Y such that 


lla —yollsc < lle—yllx, VWyed 


Thus yo is a minimizing vector, and the optimality condition is that x — yo L Y. 
Suppose that y1, --- , yw be p-dimensional random vectors with finite second- 
order moments. Let Y be the subspace generated by y1, ++: , yn, ie., 


N 
Y= {e+ doa 


i=1 


ae R", A € | (5.12) 


Any element # € Y is an n-dimensional random vector with finite second-order 
moment. By completing the linear space Y by the norm || - ||3¢ defined above, we 
see that Y becomes a Hilbert subspace of KH, i.e. Y CH. 


Lemma 5.5. Let & be an element in H, and Y be the subspace defined by (5.12). 
Then, & is orthogonal to Y if and only if the following conditions hold. 


E{z} =0, E{zy?} =0, i=1,---,N (5.13) 


Proof. Since any element < € Y is expressed as in (5.12), we see that if (5.13) 
holds, 
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N N 
(%,2)9¢ = (: a+ > sv) = E{é™}a+ 5~ traceB {ty }A7 = 0 


i=1 i=1 
Conversely, suppose that # L Y holds, i.e. (%,#)s¢ = 0 for any  € Y. Putting 
& = a, we have (Z,a)3¢ = trace(E{z}a') = 0. Taking a = E{z} yields 
||E{z}||? = 0, implying that E{z} = 0. Next, let @ = A;y;. It follows that 
(%, Aiyi)ac = trace (fay? }AT) = 0. Similarly, taking A; = Ef{zy?} € R"*? 
yields 


trace( fay! }E{éy!}") =||E{ayF}lh=0 + EfayT}=0 


This completes the proof of lemma. 


Example 5.2. Consider the random vectors x, y with probability density function of 
(5.2). Let the data space be given by Y = {b + Ay | b € R”, A € R"*? }. Then, the 
orthogonal projection of x onto the space Y is given by 


E{x | ¥} = we + Lay Dyy (y — Hy) 
In fact, let ¢ = « — (b+ Ay). Then, from the conditions of Lemma 5.5, we have 
0 = E{z} = E{x — (b+ Ay)} = ps — 6 — Apy 
0 = E{&y"} = E{[x — (b+ Ay)ly"} 


From the first condition, we have b = 4. — Ajs,. Substituting this condition into the 
second relation gives E{[a — x — A(y — py)]y* } = 0, so that 


E{[x — uz — A(y — by)|[y — Hy)" } = 0 


Thus we obtain 
SA gad SS Ae Day, 


and hence 


Thus we have shown that the orthogonal projection is equivalent to the conditional 
expectation (5.3), and hence to the minimum variance estimate (5.7). 


Suppose that y;, --: , yw be p-dimensional random vectors, and that there exist 
a set of p-dimensional independent random vectors ¥1,-++ , yn such that 


o{y,,i=1,--- kh} =ofy, i=1,--- ,k}, k<N (5.14) 


where o{y;,4 = 1,---, k} is the o-algebra generated by {y;,4 = 1,---, k}, 
which is roughly the information contained in {y;, i = 1, --- , k}. In this case, the 
random vectors 91, -+: , yn are called the innovations of y1, +++ , yn. 
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Example 5.3. We derive the innovations for p-dimensional Gaussian random vectors 


yi, +++, yn. Let Fy, = ofyi,i = 1,--- , k}, and define 91, --- , Gn as 
ni =m — Ely: | Fo} = — Et} 
G2 = yo — Efye | Fi} (5.15) 


gn = yn — E{yn | Fnw-i} 


Since, for Gaussian random vectors, the conditional expectation coincides with the 
orthogonal projection onto the data space, we have 


k—-1 
Efys | Fr—1} = Eye | ys s+ ye-a} = ae + So Aniyis Api € RP*? 
i=l 
We see from (5.15) that 
y1 [, oO} |" ay 
Yo —Ao I, Yy2 a2 
l= . eee hs (5.16) 
Uk —Api +++ —Age—1 Ip | Lye Ok 
This shows that y, is a Gaussian random vector, since it is a linear combination of 
@1,°°* , Gy and yj, --- , ye. Since the pk x pk lower triangular matrix in (5.16) is 
nonsingular, we see that y;, is also expressed as a linear combination of #1, +++ , Yx, 
a, +++ , @x. Hence (5.14) holds. 
We show that 7,--- ,yn~ are independent. From Lemma 5.1, jj, and F,_1 are 


independent, so that we get E{j, | F,-1} = 0 and F{g,} = 0. Since for k > 1, % 
is ¥,_1-measurable, 


E{GeG, } = E{E{GeG? | Fe—1}} = ELE (Ge | Fei} G7} =0 


It can be shown that the above relation also holds for k < 1, so that E{gj,g/} = 
0, k # I. Since the uncorrelated two Gaussian random vectors are independent, 7, 
are 4 (k # 1) are independent. Hence, we see that #1, --- , #~ are the innovations 
for the Gaussian random vectors y1, +--+ , yn. 
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We consider a state estimation problem for discrete-time stochastic linear dynamic 
systems. This is the celebrated Kalman filtering problem. 
We deal with a discrete-time stochastic linear system described by 


a(t +1) = A(t)a(t) + w(t) (5.17a) 
y(t) = C(t)x(t) + v(0), t=0,1,-:: (5.17b) 
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where z € IR” is the state vector, y € IR’ the observation vector, w € R” the 
plant noise vector, and v € R? the observation noise vector. Also, A(t) € R"*”, 
C(t) € R°*” are deterministic functions of time ¢. Moreover, w and v are zero 
mean Gaussian white noise vectors with covariance matrices 


e{[eaorw er} [8,20] am 


where Q(t) € R”*” is nonnegative definite, and R(t) € R’*? is positive definite 
for allt = 0, 1, ---. The initial state (0) is Gaussian with mean F{x(0)} = 2 (0) 
and covariance matrix 


E{[x(0) — w2(0)][z(0) — 2 (0)]"} = I1(0) 


and is uncorrelated with the noises w(t), v(t), ¢ = 0, 1, ---. A block diagram of the 
Markov model is depicted in Figure 5.1. 


Figure 5.1. Stochastic linear dynamic system 


Let F, = o {y(0), y(1), --- , y(t)} be the o-algebra generated by the observa- 
tions up to the present time ¢. We see that F; is the information carried by the output 
observations, satisfying F¥, C Fz, s < t. Thus F; is called an increasing family of 
o-algebras, or a filtration. We now formulate the state estimation problem. 


State Estimation Problem 


The problem is to find the minimum variance estimate @(t-+m | t) of the state vector 
a(t +m) based on the observations up to time t. This is equivalent to designing a 
filter that produces %(¢ + m | ¢) minimizing the performance index 


J = Ef|ja(t +m) —a(t+m| dl} (5.19) 


where &(t+m | t) is F,-measurable. The estimation problem is called the prediction, 
filtering or smoothing according asm > 0,m =O0orm <0. 


We see from Lemma 5.2 that the optimal estimate #(t + m | t) that minimizes 
the performance index of (5.19) is expressed in terms of the conditional expectation 
of x(t + m) given F; as 
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&(t++m | t) = E{e(t +m) | 52} 


Let the estimation error be defined by #(t + m | t) := x(t +m) — &(t +m | t) and 
the error covariance matrix be 


P(t+m|t) := Ef{fa(t +m) —&(t+m | d)][a(t +m) — #(t +m | t)]"} 


From Lemmas 4.7, 4.8 and 4.9, we see that (2, y) of (5.17) are jointly Gaussian 
processes. For Gaussian processes, the conditional expectation &(t + m | t) is a 
linear function of observations y(0), y(1), --- , y(£), so that the optimal estimate 
coincides with the linear minimum variance estimate of (¢-+m) given observations 
up to time t. More precisely, we define a linear space generated by the observations 


as F 
Yi = {e+ S> Aiy(i) 
i=0 


The space Y, is called the data space at time t. Then, from Lemma 5.5, we have the 
following results. 


ceER", Ae | (5.20) 


Lemma 5.6. The minimum variance estimate %(t+m. | t) is given by the orthogonal 
projection of x(t +m) onto Yj, i.e., 


&(t+m | t) = Ef{a(t+m) |} (21) 


The optimality of &(t + m | t) is that the estimation error &(t +m | t) is orthogonal 
to the data space (see Figure 5.2): 


&(t+m|t)=a(t+m)—-&t+m|t) Ly: (5.22) 


Moreover, the minimum variance estimate is unbiased. 
Proof. Equations (5.21) and (5.22) are obvious from Lemma 5.5. Since the data 
space Y; contains constant vectors, it also follows that F{z(t + m)} = 0,t = 
0,1,---. Thus the minimum variance estimate is unbiased. 


Figure 5.2. Orthogonal projection onto data space Y; 
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Now we define 
e(t) = y(t)- E{y(@) | Fra}, 9 t= 1,2,--- (5.23) 


where e(0) = y(0) — E{y(0) | ¥-1} = y(0) — pw, (0). Then, as in Example 5.3, it 
can be shown that e is the innovation process for y. 


Lemma 5.7. The innovation process e € R° is a Gaussian process with mean zero 
and covariance matrix 


Ef{e(t)e'(s)} = [C(t)P(t | t —1)CT(t) + R(t)]b:s (5.24) 
where P(t | t — 1) is the error covariance matrix defined by 
P(t|t—1) = Efle(t) — 4(¢ |t—V)ke(s) — 8 |t- YI} 


Proof. Since y is Gaussian, the conditional expectation E{y(t) | F:-1} is Gaussian, 
and hence e is Gaussian. By the definition (5.23), we see that 


F{e(t)| Sia} =0, — B{el(t)} =0 


Since e(s) is a function of y(0), y(1), ---: , y(s), it is F,_1-measurable if t > s. 
Therefore, by the property of conditional expectation, 


Ef{e(t)e"(s)} = E{Efe(t)e"(s) | Fra} 
= E{Efe(t) | Frs}e"(s)} = 0 


Similarly, we can prove that the above equality holds for t < s. Thus e(t) and e(s), 
t # s are uncorrelated. 
We show that (5.24) holds for t = s. It follows from (5.17b) that 


e(t) = y(t) — E{C(t)a(t) + v(t) | 3-1} 
= y(t) — Cat |t -— 1) = CHEE | t -— 1) + v(t) 
so that 
E{e(t)e™(t)} = E{[C (a(t | t-1) + o[CM#E| t-1) +o} 
= C(t) F{a(t |t-— #(¢ | t-1)}C7(t) 

+ C(t) E{e(t | t— 1)v7 (t)} 

+ Efv(t)@" (t | t-—1)}C* (t) + Efv(t)v' ()} (5.25) 
Since v(t) is uncorrelated with x(t) and £(t | t — 1), we have 

B{a(t |t— oT ()} = Ff{lelt) — a(t | t- Ijo™ } = 0 


Thus we see that the second and third terms of the right-hand side of (5.25) vanish; 
thus (5.24) holds from the definitions of R(t) and P(t | ¢ — 1). 
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In the following, we derive a recursive algorithm that produces the one-step pre- 
dicted estimates @(t + 1 | t) and £(t | t — 1) by using the orthogonal projection. We 
employ (5.21) as the definition of the optimal estimate. 

From the definition of e(t) and Y;, the innovation process is also expressed as 


e(t) = y(t) — E{y(t) | Ya} 


Thus, we have Ys = Ye_-1 © span {e(t)}, where © denotes the orthogonal sum. It 
therefore follows that 


&(t+1|t) = E{z(t+1) | ¥} = E{e(t +1) | Yin @ e(t)} 
= E{x(t +1) | Y-1} + E{x(t +1) | e(t)} (5.26) 
The first term in the right-hand side is expressed as 
E{a(t +1) | Yer} = E{A()2(t) + w(t) | Yea} 
= A(t)#(t | t— 1) (5.27) 
and the second term is given by 
E{x(t +1) | e(t)} = K(te(t) (5.28) 


where K(t) € R”*? is to be determined below. 
Recall that the optimality condition for K(t) is x(¢ + 1) — K(t)e(t) L e(t), ie. 


B{[e(¢ + 1) — K(He(#)]e" ()} = 0 


so that 
K (t) = E{a(t + le"(t)}(Ffe(te'()})7! (5.29) 


We see from the definition of e(t) that 
Ef{e(t + De} = E{[AMa) + w()|[CH#E | t- 1) + o()]"} 
= A(t) E{x(t)#1 (t | t—1)}C7(t) 
+ A(t) E{2x(t)o™ (t)} 
+ E{w(t)#" (¢ | t—1)}C7(4) 
+ E{w(t)u" (t)} (5.30) 


Noting that w(t), v(t) are white noises, the second and the third terms in the above 
equation vanish, and the fourth term becomes S(t). Also, we have 


a(t) =#(t|t-1)+4(t|t-),  a(t|t-1) La(e|#—-D 


It therefore follows that 
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E{ax(t)#" (t |t—1)} = Ef{#(t |t-— De (t|t-—1)} = P(t|t—-1) 
Thus from (5.30), we get 
E{a(t + le" (t)} = A(t)P(t | t-— 1)C*(t) + S(t) 


Since R(t) > 0, we see that E{e(t)e'(t)} = C(t) P(t | t-—1)CT(t) + R(t) > 0. 
Thus, from (5.29) 


K(t) = [A®PWCT() + SHICHPOC™( + RO (5.31) 


where /(t) € IR”*? is called the Kalman gain. 

For simplicity, we write the one-step predicted estimate as ¢(t). Accordingly, the 
corresponding estimation error and error covariance matrix are respectively written 
as &(t) and P(¢). But, the filtered estimate and filtered error covariance matrix are 
respectively written as @(¢ | t) and P(t | t) without abbreviation. 


Lemma 5.8. The one-step predicted estimate satisfies 
&(t+1) = A(M)2(t) + K(@)[y(t) — CHZ)] (5.32) 
with £(0) = ft2(0), and the error covariance matrix is given by 


P(t+1) = A(t)P()AT (t) — K(H)[CH) PACT (4) + ROK (A) 
+Q(t),  P(0) =1(0) (5.33) 


Also, the predicted estimate £(t + 1 | t) is unbiased, i.e. 
Ef{e(t+1)—a(t+)}=0, t=0,1,--- (5.34) 


Proof. Equation (5.32) is immediate from (5.26), (5.27) and (5.28). Now, it follows 
from (5.17a) and (5.32) that the prediction error satisfies 


#(t +1) = [A(t) — K()C(H]z(t) + w(t) — K (v(t) (5.35) 


Since w(t) and u(t) are white noises with mean zero, the expectation of both sides 
of (5.35) yields 


E{a(t + 1)} = [A®) — KCMJE{E@} 
From the initial condition @(0) = (0), we have E{%(0)} = 0, so that 
E{a(t + 1)} = (A® — KOC) --- (AO) — K(0)C(0)) E{#(0)} = 0 


This proves (5.34). Also, w(t) and u(t) are independent of £(t), so that from (5.35), 
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Hence, we see that 
P(t +1) =[A@) — KOCO)]POAW® —- KOC@]’ 
+ Q(t) + K(t)R(t)K" (t) — S(t) KK" (t) — K(t)S" (t) 


By using K (t) of (5.31), we have (5.33). 


If the matrix C(t) P(t)C™ (t) + R(t) is singular, the inverse in (5.31) is to be 
replaced by the pseudo-inverse. Given the one-step prediction £(t) := &(¢ | t—1) and 
the new observation y(t), we can compute the new one-step prediction @(t + 1) := 
&(t + 1 | t) from (5.32), in which we observe that the Kalman gain K (¢) represents 
the relative weight of the information about the state vector (t+ 1) contained in the 
innovation e(t). 


Lemma 5.9. Given the one-step predicted estimate %(t), the filtered estimate &(t | t) 
and its error covariance matrix P(t | t) are respectively given by 


&(t | t) = &(t) + Ky (te(t) (5.36) 
Ks(t) = P@C*H[CHPHC* () + RO] (5.37) 
and 
P(t|t) = P(t) — P@CTAH[CHPHOCTA)+ RO] ICAP) (5.38) 
Proof. By definition, the filtered estimate is given by 
&(t | t) = E{x(t) | Ye} = E{x) | Ye-1 @ e(t)} 
= E{x(t) | Yes} + E{x(t) | e(t)} = #(t) + E{x(t) | e(t)} 
where we have 
E{ax(t) | e(t)} = E{x(@)e™ t)}(Ef{e(t)e" (t)})*e(t) 
= E{x(t)[@*()C*(t) + vo ()}(Efe@e™(}) te 
= P()C* (HC) PE)C™ (t) + RO] e(t) =: Ky (elt) 
This proves (5.36) and (5.37). Moreover, the estimation error is given by 
&(t | t) = &(t) — P(L)C* (RE) + CH)P()C* (He) 


Noting that E{2(t)e'(t)} = P(t)C™(t) and taking the covariance matrices of both 
sides of the above equation yields (5.38). 


Using the algorithm (5.36) ~ (5.38), the filtered estimate @(¢ | ¢) and associated 
error covariance matrix P(t | t) can be computed from the predicted estimate <(t) 
and the associated error covariance matrix P(t). Summarizing the above results, we 
have the following filter algorithm. 
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Theorem 5.1. (Kalman filter) The algorithm of Kalman filter for the discrete-time 
stochastic system described by (5.17) and (5.18) is given by the following (i) ~ (v). 


(i) Filter equations 
&(t +1) = A(#a(t) + K()[y(t) — CH] (5.39a) 
&(t | t) = &(t) + Ky (Oly) — C2) (5.39b) 
(ii) The innovation process 
e(t) = y(t) — C(t)£(0) (5.40) 
(iii) Kalman gains 
K(t) =[A@P@CT(t) +S [CHPOC*H)+ RO]! (41a) 
K;(t) = P(#)CT[C)P@CT (t) + RW)" (5.41b) 
(iv) Error covariance matrices 
P(t+1) = A(@t)P(t)Al(t) — K(@)[C(t)P()CT (4) + RO) KT (t) 
+ Q(t) (5.42a) 
P(t | t) = P(t) — P()CT(H[CH)P()CT (t) + R()]-'C) P(t) (.42b) 


(v) Initial conditions 


&(0) =p2(0), — P(0) = 11(0) (5.43) 


Figure 5.3 displays a block diagram of Kalman filter that produces the one-step 
predicted estimates &(t) and &(t + 1) with the input y(t). 


Figure 5.3. Block diagram of Kalman filter 


The structure of Kalman filter shown above is quite similar to that of the discrete- 
time stochastic system of Figure 5.1 except that Kalman filter has a feedback-loop 
with a time-varying gain A(t). We see that the Kalman filter is a dynamic system 
that recursively produces the state estimates £(f+ 1) and &(¢ | t) by updating the old 
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estimates based on the received output data y(t). The Kalman filter is, therefore, an 
algorithm suitable for the on-line state estimation. 

Equation (5.42a) is a discrete-time Riccati equation satisfied by P(t) € R°*”. 
Being symmetric, P(t) consists of n(n + 1)/2 nonlinear difference equations. We 
see that the Riccati equation is determined by the model and the statistics of noise 
processes, and is independent of the observations. Thus, given the initial condition 
P(0) = I7(0), we can recursively compute P(t), t = 1,2,---, andhence K(t), t = 
1,2,--- off-line. 

It follows from the definition of the innovation process e that the Kalman filter 
equation is also written as 


#(t+ 1) = A(t)a(t) + K (felt) (5.44a) 
y(t) = C(t)2(t) + e(t) (5.44b) 


Equation (5.44) as a model of the process y has a different state vector and a noise 
process than those of the state space model of (5.17), but the two models are equiv- 
alent state space representations that simulate the same output process y. The model 
of (5.44) is called the innovation representation, or innovation model. The innova- 
tion model is less redundant in the noise models, and is often used in the stochastic 
realization, or the state space system identification. 


Example 5.4. Consider an AR model described by 


where @ is an unknown parameter, and v is a white noise with N(0, 7). The problem is 
to estimate the unknown parameter 6 based on the observations Y;. Define x(t) = 6 
and w = 0. The AR model is rewritten as a state space model 


a(t+1) = x(t), y(t) = c(t)a(t) + v(t) (5.45) 


where c(t) = y(t — 1). The state estimate based on the observations gives the least- 
squares estimate of the unknown parameter, i.e., 


&(t +1) = E{e(t +1) |W} = 441) = Efo| Ys} 


Applying Kalman filter algorithm of Theorem 5.1 to (5.45) yields 


t+1)= : 0)=p0 >0 
Since po > 0, we have p(t) > 0 for all t > 0. Thus the inverse p~1(t) satisfies 
z 2 c(t . = 
rrety=r +9, 0) =n5' 


i 
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so that 


: ba, dee oe 
P(t) =p += Yoyi-1) 
i=l 


Since, from Example 4.4, the process y is ergodic, we have 


t 
id 2/4 2 
lim ay (@—l) =a, 


t—co 


in the quadratic mean. Hence for large t, 


4 -1 5; Gene 
p (t)~p +—t => pi)~(a)er 
r Oy t 


showing that the estimate 6(t) converges to the true @ in the mean square sense with 
the asymptotic variance of the order 1/t. 


Remark 5.1. Recall that it is assumed that the coefficients matrices in the state space 
model of (5.17) are deterministic functions of time t. However, the state space model 
of (5.45) does not satisfy this basic assumption, because c(t) is a function of the 
observation y(t — 1). Thus strictly speaking the algorithm of Theorem 5.1 cannot be 
applied to the state space model with random coefficients. 


In this regard, we have the following result [28]. Recall that the o-algebra F; is 
defined by Ft = o{y(0), y(1), cae ,y(t)}. 


Lemma 5.10. Suppose that for the state space system of (5.17), the conditions (i) ~ 
(iv) are satisfied. 
(i) The noise vectors w and v are Gaussian white noises. 
(ii) The a priori distribution of the initial state x(0) is Gaussian. 
(iii) The matrices A(t), Q(t) are ¥4-measurable, and C(t), S(t), R(t) are F14-1- 
measurable. 
(iv) The elements of A(t), C(t), Q(t), S(t), R(t) are bounded. 


Then the conditional probability density function p(a(t) | ¥+¢) of the state vector 
given the observations is Gaussian. 


Proof. Fora proof, see [28]. 


This lemma implies that if the random coefficient matrices satisfy the conditions 
above, then the algorithm of Theorem 5.1 is valid, and so is the algorithm of Example 
5.4. In this case, however, the estimates £(¢ | ¢) and &(t) are to be understood as the 
conditional mean estimates. 

In the next section, we consider stochastic systems with exogenous inputs, which 
may be control inputs, reference inputs or some probing signals for identification. A 
version of Kalman filter will be derived under the assumption that the inputs are 
F,-measurable. 
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Figure 5.4. Stochastic system with inputs 


5.4 Kalman Filter with Inputs 


Since there are no external inputs in the state space model of (5.17), the Kalman filter 
algorithm in Theorem 5.1 cannot be applied to the system subjected to exogenous or 
control inputs. In this subsection, we modify the Kalman filter algorithm so that it 
can be applied to state space models with inputs. 

Consider a discrete-time stochastic linear system 


a(t +1) = A(t)x(t) + Bu(t) + w(t) (5.46a) 
y(t) = C(t)a(t) + v(t) (5.46b) 


where u(t) € R” is the input vector, and B(t) € R”*™ is a matrix connecting 
the input vector to the system as shown in Figure 5.4. We assume that u(t) is ¥;- 
measurable, i.e., u(t) is a function of the outputs y(0), y(1), --- , y(¢), including 
deterministic time functions. We say that ;-measurable inputs are admissible inputs. 
Since the class of admissible inputs includes ¥,-measurable nonlinear functions, the 
process x generated by (5.46a) may not be Gaussian nor Markov. Of course, if u(t) 
is a linear output feedback such that 


u(t) = L(t)y(t) = L(t)C@)a(t) + Li)v(t), L(t) € R™*P 


then {x(t), t = 0, 1,--- } becomes a Gauss-Markov process. 
In the following, we derive a filtering algorithm for (5.46) that produces the one- 
step predicted estimates ¢(t) and &(t + 1). 


Lemma 5.11. Suppose that x(t) and x,,(t) are the solutions of 
Lw(t +1) = A(t)r.(t) + w(t), Lw(0) = x(0) (5.47) 


and 
a, (t+ 1) = A(t)a,(t) + Bi)u(t), 2, (0) = 0 (5.48) 
respectively. Then, the solution x(t) of (5.46a) is expressed as 


a(t) = tw(t) + vu (t), ¢=0,1,-:- (5.49) 


Proof. A proof is immediate from the linearity of the system. 
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By using the state transition matrix of (4.60), the solution of (5.48) is given by 


t-1 
au(t)= 5° O(t,k+1)Blk)u(k), + =0,1,--- (5.50) 

k=0 
Thus it follows that x,,(t) is a function of u(0), u(1), --- , u(t — 1), so that «,,(¢) 


is ¥,_,-measurable, and hence F;-measurable. From the property of conditional ex- 
pectation, 


a(t | t) = E{x.(t) | F:} + 2u(t) (5.51a) 
&(t) = E{zw(t) | Fi-1} + u(t) (5.51b) 


Since x,,(¢) is known, it suffices to derive an algorithm for computing the estimates 
of the vector «,,(t) of (5.47) based on observations. 


Lemma 5.12. By using (5.46b) and (5.49), we define 
A(t) := y(t) — C(é)zu(t) = C(t)ru(t) + v(t) (5.52) 


Let $! be the o-algebra generated by {h(i),i = 0, 1,++-, t}. Then, F" = F; 
holds, implying that the process h of (5.52) contains the same information carried 
by the output process y. 


Proof. Since x,,(¢) is ¥,-measurable, we see from (5.52) that h(t) is ¥,-measurable. 
Thus, we get F? = o{h(0), h(1), --- , h(t)} C F. Now, we show that F; c FP. 
From (5.50) and (5.52), 


y(t) = h(t) + C() y P(t, k + 1) B(k)u(k) 
k=0 


For t = 0, we have y(0) = h(0), so that Fy = F# holds. For t = 1, y(1) = h(1) + 
C(1)B(0)u(0). Since u(0) is ¥o-measurable, and Fo = F2 holds, y(1) is the sum 
of h(1) and $2-measurable C'(1)B(0)u(0), implying that y(1) is ¥?-measurable. 
Thus, we get F; C a. Similarly, we can show that F; C oe holds. Hence, we have 
diese ea Oe eee 


Let the predicted estimates of the state vector x,,(t) of (5.47) be given by 
éu(t+1)=Ef{aw(t+1) | Ff},  @w(t) = Ef{zw(t) | Fea} 
It follows from (5.51) that 


&(t+1)=4,(t+ 1) +2,(¢4+ 1) (5.53a) 
&(t) = &.,(t) + tu (t) (5.53b) 


Since the state vector x,,(t) is given by (5.50), the algorithm is completed if we can 
compute &,,(t) and @,,(¢ + 1). From (5.47) and (5.52), we have 
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Lw(t+1) = A(t)x,(t) + w(t) (5.54a) 

A(t) = C(t)eu(t) + v(t) (5.54b) 

This is a stochastic linear system with the state vector x, (¢) and with the observation 
vector h(t). Moreover, this state space model does not contain external inputs; hence 


we can apply Theorem 5.1 to derive the Kalman filter algorithm for (5.54). 
The innovation process for h of (5.54) is given by 


so that e;, coincides with the innovation process e for the observation y. Also, from 
(5.49) and (5.53), we have 


a(t) — &(t | t) =au(t) — &u(t | 2) 
a(t +1) —&(t+1)=au(t+ 1) —4.,(t+1) 
Thus the error covariance matrices are given by 
P(t | t) = E{{rw(t) — @u(t | Dllew(t) — eu(t | O]"} (5.55a) 
P(t +1) = E{[eu(t +1) — u(t + 1) [2u¢+ 1) —2.(¢+1]"} (.55b) 


This implies that the error covariance matrices are independent of the admissible 
input uw, so that they coincides with the error covariance matrices of the system de- 
fined by (5.54). Hence, the prediction error &(t) = x(t) — &(t), t = 0, 1,--- isa 
Gauss-Markov process with mean zero and covariance matrix P(t). 


Theorem 5.2. Suppose that u(t) in (5.46) be ¥,-measurable. Then, the Kalman filter 
algorithm for the stochastic system with admissible inputs is given by (i) ~ (iv). 


(i) Filter equations 


&(t + 1) = A(t)a(t) + B(t)u(t) + K()e(t) (5.56a) 
a(t | t) = &(t) + K;(te(t) (5.56b) 
e(t) = y(t) — C(t) &(t) (5.56c) 


(ii) Filter gains 
K(t) =[A@P@CT(t) + S@[CHPOCAH) + R]-! — (6.57a) 
K;(t) = P(#)CTH[CH)PA@CT (t) + RW] (5.57b) 


(iii) Error covariance matrices 


126 5 Kalman Filter 
P(t+1) = A(t)P(t)A' (t) — KQ@)[C(t) P(DCT (4) + RO) KT (t) 
+ Q(t) (5.58a) 
P(t | t) = P(t) — P(#)C*(t)[(C)PA)CT (t) + R(t)|-'C(t) P(t) (5.58b) 


(iv) Initial conditions 


£0) =p2(0), — P(0) = 11(0) (5.59) 


Proof. It follows from Theorem 5.1 that the Kalman filter for the system described 
by (5.54) is given by 


#.,(t +1) = A(t)4.,(t) + K (ten (t) (5.60a) 
&..(t | t) = &.,(t) + Kp (t)en(t) (5.60b) 
From (5.48), (5.53), (5.60) and the fact that e,(t) = e(t), we get 
&(t +1) =4y(t +1) + 2u(t +1) 
= A(t)éw(t) + K@e(t) + AMazu(t) + Bt)u(t) 
= A(t)@(t) + B(t)u(t) + K(t)e(t) 
Thus we have (5.56a). From (5.53) and (5.60), 
a(t |t) = 4, (t | t) + 2.(t) = &u(t) + Ky (Be(t) + u(t) 
= &(t) + Ky(de(t) 
This proves (5.56b). Equations (5.57) ~ (5.59) are obvious from (5.55). 


Figure 5.5 shows a block diagram of the optimal filter. It seems that the form 
of optimal filter is quite obvious in view of Figure 5.4, but F;-measurability of the 
inputs is needed for the filter in Figure 5.5 to be optimal in the sense of least-squares. 


Figure 5.5. Block diagram of Kalman filter with inputs 
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5.5 Covariance Equation of Predicted Estimate 


Recall from (5.17a) that the covariance matrix [T(t) = cov{«(t)} of the state vector 
a(t) satisfies (4.67). As in (4.77), we define 


CT (t) = A(t) (t)C7 (t) + S(t) 


It then follows from Lemma 4.9 that the covariance matrix of y is expressed as 


C(t) A(t)-+ A(s + 1)C7(s), t>s 

Ayy(t, 8) = < C(t) M(t)C™ (t) + R(t), ee (5.61) 
C(s)A'(s +1)---AT(H)CT (8), t<s 

For simplicity, we define A(t) := A,,(t,t). Then, in terms of A(t), C(t), C(t), 


A(t), we define a new Riccati equation 
S(t +1) = A) L(A (t) + (C7) — AW L()C™() 
x [A(t) — CHEECH] (CW) —CWHZOHAH) 6.62) 


with 1(0) = 0. The following theorem gives a relation between the new Riccati 
equation (5.62) and the Riccati equation (5.42a) satisfied by P(t). 


Theorem 5.3. The solution 3(t) of Riccati equation (5.62) is the covariance matrix 
of the predicted estimate £(t), and the relation 


P(t) = IT(t) — Z(t) (5.63) 
holds. Moreover, the Kalman gain of (5.41a) is equivalently expressed as 
K(t) =[C*() —- AMZOC* IAW - COZOC*O] (5.64) 
Proof. From ©(0) = 0, (5.63) is obvious for t = 0. Since 
A(0) = C(0)7(0)C7(0) + R(O), = C*(0) = A(0)(0)C™(0) + S(0) 
we see from (5.64) that 
K (0) = [A(0)7(0)C* (0) + S(0)][C (0) 7(0)C* (0) + R(O)]™* 


The right-hand side of the above equation equals the Kalman gain at t = 0 [see 
(5.41a)]. Suppose that (5.63) and (5.64) are valid up to time ¢. Then, from (4.67) and 
the definition of A(t), 


T(t +1) — S(t +1) = A(t)(t)A™ (t) + Q(t) — AM Z(H AT (Et) 
K (HAW —CHLHC OK (t) 
AH P()A* (1) + Q(t) 
K(H[C)PUWC™ (t) + ROIK*(t) = Pt +1) 
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This implies that (5.63) holds for time ¢ + 1. Further, we have 
CT (#+1) — A@+ I) 24+ 1)CT¢+1) 
= A(t + 1)P(t+ 1)CT(t+1) + S(t +1) 
AG+1)=CEG+N)2E+0)CT E41) 
=C(t+1IPt¢+1CT(t+1)4+ R(t4+1) 


so that (5.64) also holds for time ¢ + 1. 
We show 2'(t) = cov{#(t)}. By the property of conditional expectation, 


EX&(t)} = BLE {e(t) | Fiat = Eta} = we) 


Hence, we have 
a(t) — f(t) = &(t) — pe (t) + &(t) 


where &(¢) — u(t) L &(t). Computing the covariance matrix of the above equation 


yields 
IT(t) = E{[é(t) — we HE) — He ()]"} + E{BW)E"(O} 
where E{z(t)#'(t)} = P(t), so that 


E{[é(t) — we (t)/[@@) — we(t)]"} = LO) 


as was to be proved. 


It should be noted that the Riccati equation of (5.62) is defined by using only the 
covariance data of the output signal y [see (5.61)], so that no information about noise 
covariance matrices Q(t), S(t), R(t) is used. Thus, if the statistical property of y is 
the same, even if the state space realizations are different, the Kalman gains are the 
same [146]. The Riccati equation (5.62) satisfied by the covariance matrix ¥7(t) of 
the predicted estimate plays an important role in stochastic realization theory to be 
developed in Chapter 7. 


5.6 Stationary Kalman Filter 


Consider the Kalman filter for the stochastic LTI system of (4.70). Since all the 
system parameters are time-invariant, it follows from (5.39a) in Theorem 5.1 that the 
Kalman filter is expressed as 


#(t +1) = A&(t) + K(t)[y(t) — C#(0)] (5.65) 


where &(t) := &(¢ | t — 1) with the initial condition £(0) = ju, (0), and where the 
Kalman gain is given by 


K(t) = [AP(t)CT + SICP(#)CT + R]7! 
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Also, the error covariance matrix P(t) := P(t | t — 1) satisfies the Riccati equation 
P(t +1) = AP(t)AT — K()[CP()CT + RIKT(t) +Q (5.66) 


with P(0) = I7(0). 

Suppose that a solution P(t) of the Riccati equation (5.66) converges to a con- 
stant matrix as t + oo. Put P(t) = P(t +1) = P in (5.66) to get an algebraic 
Riccati equation (ARE) 


P = APA™ — (APCT + 8)(CPC™ + R)“'(APC™+S)'+Q (5.67) 
In this case, k(t) converges to the stationary Kalman gain 
K = (APCT +. 8)(CPC™ +R)" (5.68) 
Hence, the filter equation (5.65) becomes 
&(t+1) = (A- KC)&(t) + Ky(t) (5.69) 


This filter is called a stationary Kalman filter that produces the one-step predicted 
estimates of the state vector. 

In the following, we define $ := A— SR-!C and M := Q — SR—'S". Then, 
it can be shown that the ARE of (5.67) reduces to 


P=46(P—PC"[CPC' + R)'CP)¢' +M (5.70) 
Theorem 5.4. The following statements are equivalent. 
(i) The pair (@, M‘/2) is stabilizable and (C, ®) is detectable. 


(ii) There exists a unique nonnegative definite solution P of the ARE (5.70); more- 
over, P is stabilizing, i.e., ® — IC is stable, where 


I =@PC™(CPCt +R)" 


Under the above condition (i), the solution P(t) of the Riccati equation (5.66) for 
any P(0) > 0 converges to a unique nonnegative definite solution P. 


Proof. For proof, see [11, 20,97]. 


Example 5.5. Consider a scalar system 


a(t+1) = azx(t) + w(t), y(t) = x(t) + v(t) 


where A=a,C =1,Q=q>0,S =0, R=r > 0. From (5.39a) ~ (5.42b) of 
Theorem 5.1, the Kalman filter and Riccati equation are given by 


FOE Re seal ae ee 
a*rp(t) 


p(t) +r 


#(t+1) = 


pit + 1) = 


+q, pd)=m (5.72) 
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Thus the ARE reduces to p? + [(1 — a?)r — q|p — qr = 0, so that the ARE has two 
solutions 


1 
ps = 5 |(@ -1r+a+ Vie—Dr +P + 4ra] > 0 
1 
p= 5 |(@ -Urta- VP =r aP + ara] <0 
Putting a = 0.8, r = 1, g = 2, we have p; = 2.4547 and p_ = —0.8147. In Figure 


5.6, the solutions of (5.72) for ten random initial values py ~ N(0, 4) are shown. In 
this case, all the solutions have converged to py = 2.4547. 


Solutions 


10 
Number of steps 


Figure 5.6. Solutions of the Riccati equation (5.72) for random initial values, where the initial 
time is taken as t = 1 for convenience 


We see that the stationary Kalman filter is expressed as 
R(t +1) = A&(t) + Ke(t) (5.73a) 
y(t) = C&(t) + e(t) (5.73b) 
By means of Theorem 5.3, the stationary Kalman gain is also expressed as 
K=(C*=Asc hao =csc ty (5.74) 
where the covariance matrix ©’ = cov{%(t)} satisfies the ARE 
SSAWVAT (CO =AyC AG SCzOTy tC =C54") (5.75) 


The state space equation (5.73) is called a stationary forward innovation model for 
the stationary process y, where the noise model is less redundant than that of (4.70). 


5.7 Stationary Backward Kalman Filter 131 


5.7 Stationary Backward Kalman Filter 


In this section, we derive the Kalman filter for the backward Markov model for the 
stationary process y, which is useful for modeling stationary processes. 
Consider the backward Markov model of (4.83), i.e., 


ap(t — 1) = At ay(t) + ws(t) (5.76a) 
y(t) = Cx, (t) + vs (t) (5.76b) 


where wy, and vy are white noises with covariance matrices 
wo(t) |. r T _[Q8 
B{ es | [w, (s) v5 (s)} = Ei R Ots (5.77) 
Moreover, we have cov{z,(t)} = JJ = IT~' and 
OT AiAs SSC SA AC.. R= AO) =CAC* (5.78) 
In order to derive the backward Kalman filter, we define the future space 


Ye = Spanf{y(t), y(t +1), ++} (5.79) 


Since we deal with stationary processes with mean zero, no constant vectors are 
included in the right-hand side of (5.79), unlike the past space defined by (5.20). 
Then, the one-step backward predicted estimate is defined by 


y(t) = Efao(t) | Ya} (5.80) 
Also, we define the backward innovation process by 


est) = y(t) — Efy(t) | Ya} (5.81) 


Lemma 5.13. The backward process ey is a backward white noise with mean zero 
and covariance matrix 


cov{es(t)} = A(0) —-CZC™ (5.82) 


where 3) = cov{%,(t)}. 
Proof. By the definition of orthogonal projection, we see that E{e,(t) | Yj ii} =9, 
E{e,(t)} = 0. For s < t, it follows that e,(t) € Yj C Y3,,, so that 


s4 


Efes(t)ey (s)} = E{E{eo(t)eg (8) | ¥2..}} 
= Ef{es(t)E{es (s) | ¥,1}} = 0 


Similarly, one can show that the above equality holds for s > t, implying that e,(t) 
and e,(s) are uncorrelated for s # t. This shows that e, is a zero mean white noise. 
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We compute the covariance matrix of ey. It follows from (5.76b) and (5.81) that 
es(t) = y(t) — B{Cxo(t) + vo(t) | Yer} 
= y(t) — C¥5(t) = Clan (t) — 0(t)] + v(t) 
Hence, noting that vz (¢) is uncorrelated with x(t) and #(t) € Yr41, we have 
cov{es(t)} = CE{[xo(t) — #0 ()][xo(t) — &4(t)]"}C* + Efvs(t)up (} 
= CB{a9(t)x} (t)}O° — CEfas(t)a2 ()}C* 
— CE{#,(t)a} (t)}C? + CE{#,(t)2) ()}CT +R (5.83) 
Since, from (5.80), x5 (t) — #5 (t) L &(t), we have 
F{ao(t)ey ()} = Else ()ty (O} = ¥ (5.84) 
Applying this relation to (5.83) yields 
cov{es(t)} = CHOT — CEC? + ACO) — CHET 
= A(o) — CECT 


This completes the proof. 


The backward Kalman filter is given by the next theorem. 


Theorem 5.5. (Backward Kalman filter) The backward Kalman filter equations for 
the backward Markov model are given by (i) ~ (iv). 


(i) The filter equation 
&,(t — 1) = AT &,(t) + K"[y(t) — Ca(t)] (5.85) 


where AT — K™C is stable. 
(ii) The innovation process 


ep(t) = y(t) — Ca(t) (5.86) 
(iii) The backward Kalman gain 


RP S(Cl aA CAG) 00h) (5.87) 


(iv) The ARE satisfied by the covariance matrix of the backward predicted estimate 
&p(t) is given by 


3 = ATS A+ (CT — ATEO™)(A(O) —CSCT)1(C —CEA) (5.88) 


This is the dual ARE of (5.75), satisfied by the covariance matrix 3) = cov{&(t) }. 
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Proof. It follows from (5.76a) and Lemma 5.13 that 
E{ay(t—1) | Ye} = B{ao(t — 1) | Yer1 © eo(¢)} 
= E{A™ ap (t) + wo(t) | Yori} + E{as(t — 1) | eo(t)} 
= ATB{as(t) | Yri} + E{ae(t — 1) | eo(t)} 
Thus from (5.80), we get (5.85), where the backward Kalman gain is determined by 
KT = cov{as(t — lez (t)}(covfes(t)})— 
It follows from (5.84) that 
E{ay(t — ey (t)} = E{[A*ao(t) + wo(t)][C[zo(¢) — eo(d)] + ve (0)]"} 
= ATIC — ATSC™ +58 
=CT- ATECT 


Thus the backward Kalman gain is given by (5.87). Finally, the dual ARE (5.88) can 
be easily derived by computing the covariance matrix of (5.85). 


In view of Theorem 5.5, the backward innovation model is given by 
&p(t — 1) = Al &,(t) + KT e,(t) (5.89a) 
y(t) = C&s(t) + en(t) (5.89b) 
This should be compared with the forward innovation model of (5.73). 
We are now in a position to summarize the different Markov models for a station- 
ary process y, including forward and backward Markov models defined in Sections 


4.7 and 4.8, and the forward and backward innovation models obtained in Sections 
5.3 and 5.7 through the stationary Kalman filters. 


Table 5.1. Schematic diagram of different Markov models 


Forward model Kalman filter Forward innovation model 
UII, A, C; C, A(0)) 6 (x, A, kK, C, A(0)) 
1 


(1~', A*,€,C, A(0)) =r (2, A", K™, C, A(0)) 


Backward model Backward Kalman filter Backward innovation model 


Table 5.1 displays a schematic diagram of different Markov models for the same 
stationary process y with the covariance matrix A,, (I). From Lemmas 4.10 and 
4.11, we see that (A,C,Q,S,R) determines (17, A,C,C, A(0)), and vice versa. 
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Hence, we say that the stationary forward model (4.70) with t) — —oo is char- 
acterized by the quintuple (I7, A,C,C, A(0)), where A(0) := Ay,(0). Also, by the 
similar argument, we see that the backward Markov model (4.83) is specified by 
(11-1, AT, C,C, A(0)). On the other hand, the forward innovation model of (5.73) 
is characterized by the quintuple (2’, A, K,C, A(0)), and the backward innovation 
model by (2°, AT, K7,C, A(0)); however, note that 3 4 Y-?. 


5.8 Numerical Solution of ARE 


The stabilizing solution © of the ARE (5.75) can be obtained by using a solution 
of the generalized eigenvalue problem (GEP) associated with the ARE. Consider the 
ARE given by (5.75), i.e. 


ee AVA? RIG? = ASO VAG) C307)“ (CCA) (5.90) 


where the Kalman gain is given by (5.74). 
Define F := A — C™A~1(0)C. Then, (5.90) is rewritten as (see Problem 5.7) 


Serer +FLC (AO CLC") Csr +O A (OC 681) 


Associated with (5.91), we define the GEP 


as 0 In —CTA~*(0)C 
Sete ce a 71) _ ) (0) ay (5.92) 
-—CVAN(O)C I, | | 22 0 F 22 
Suppose that there are no eigenvalues on the unit circle (|z| = 1). Then, we can 


show that if A 4 0 is an eigenvalue, then the inverse 1/. is also an eigenvalue (see 
Problem 5.8). Hence, (5.92) has 2n eigenvalues, and n of them are stable and other 
n are unstable. 


Let U = 


in € C?"*" be the matrix formed by the eigenvectors correspond- 


ing to the n stable eigenvalues of (5.92). Thus, we have 


Pe ed 


FT 0 
-~GTA“(0)G Ip 


U; 
i | Jo (5.93) 


where all the eigenvalues of Jo € C”*” are stable. 


Lemma 5.14. Suppose that the GEP of (5.92) has no eigenvalues on the unit circle. 
Also, suppose that det(U,) # 0 and that R() := A(0) —CSC™ > 0. Then, the 
stabilizing solution of the ARE (5.90) is given by the formula 


SS Ua0 yr” (5.94) 


Proof. [124] We show that 1’ = U2 Uae is a solution of the ARE of (5.91). From 
(5.93), we get 
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FTU, = U1 Jo — C1 A1(0)CU2 Jo 
FU2Jy = —CT A1(0)CU, + U2 
Post-multiplying the above equations by U; ' yields 
PPS Ui, = CA O)CUzIsU, * (5.95a) 
U2U,* = FUgIoU; 1 + CTA *(0)C (5.95b) 
From (5.94) and (5.95b), 
Ric(Z) : = FLFT — 5+ FZC™(A(0) —CEC™) CLF ™ + CTA 1(0)C 
= FSF" — FU2JoUS! + FZC™(A(0) — CZC™)“'CZF" 
Also, from (5.95a), we have 
Ric(Z)) = FU2U,"[U, JoU | — CTA“! (0)CU2 JgU,'] — FU2 JUL! 
+ FU,U,'C™(A(0) — CU,U,1C™)“1CU,U, 
<[idol, SOA CIs ee, || 
= —FU2U;'CT A“! (0)CU2JgU,' 
+ FU2U,*C™(A(0) — CU2U1C™)-!CU2 QU, 
— FU2U>1C™(A(0) — CU,U;'C7)“! 
x CU2U,'CT A71(0)CU2IoU' 


Define a := FU2U;'C™, 8 := CUsJoU;* and y := CU2U; ‘C7. Then, it 
follows that 


Ric(Z) = —aA7"(0)8 + a(A(0) — 7-8 — a(A(O) — 7)7'yA*(0) 8 
= —a[A~*(0) — (AO) — 7)" + (AO) — 7)? 7A" (0)]8 
= —a(A(0) — 7)~*[(A(0) — 7A (0) — I+ yA" (0) 8 = 0 


as was to be proved. 

Finally we show that the closed loop matrix Ax := A — KC is stable. Recall 
that K = (CT — AYC™)(A(0) —CC™)-1. Since A = F + C7 A-1(0)C, we see 
that 


Ak = F? +07 A1(0)C 
OTAGO) =CSC")(CH=CxelFT + GC? A (6)E)) 
= F™ 4 C7(A(0) —CZC™)“1CSFT +07TA1(0)C 
=O AG) {COATS OSC Ae 
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It is easy to see that the second and third terms in the right-hand side of the above 
equation vanish, so that 


AL SFT ¢C (AO)=CNC yy CsF (5.96) 
Substituting F’ T of (5.95a) into (5.96) yields 
ADF? eC (A0)=ChC") Clsu tr" 
=U.) =C AA O)CtsU + CAG —Cxe")-! 
x CU2U, [U1 JU — CT A71(0)CU2 JU") 
=TiJol, £C' AO) =CrLC") “Chit 
=Cl AG) =C20 = AG =CrC) Ser) 
x A (0)CU2IJoU,* 
Slots: 


Thus the eigenvalues of Ax are equal to those of Jo. This completes the proof. 


5.9 Notes and References 


e There is a vast literature on the Kalman filter; but readers should consult basic 
papers due to Kalman [81], Kalman and Bucy [84], and then proceed to a survey 
paper [78], books [11, 23, 66, 79], and a recent monograph for adaptive filtering 
[139], etc. 


e Section 5.1 reviews a multivariate Gaussian probability density function based 
on [14]; see also books [79, 111, 136] for the least-squares estimation (minimum 
variance estimation). The state estimation problem for a Markov model is defined 
in Section 5.2, and the Kalman filter algorithm is derived in Section 5.3 based on 
the technique of orthogonal projection; see [11,23,66]. Also, in Section 5.4, the 
Kalman filter in the presence of external inputs is developed by extending the 
result of [182] to a discrete-time system. 


e Section 5.5 derives the Riccati equation satisfied by the covariance matrix ¥’(t) of 
the one-step predicted estimate, which is a companion Riccati equation satisfied 
by the error covariance matrix. It should be noted that the Riccati equation for 
5(t) is defined by using only the covariance data for the output process y. Thus, 
if the covariance information of y is the same, the Kalman gain is the same even 
if state space realizations are different [11, 146]. This fact is called invariance of 
the Kalman filter with respect to the signal model. 


e The stationary Kalman filter and the associated ARE are derived in Section 5.6. 
The existence of a stabilizing solution of the discrete-time ARE has been dis- 
cussed. For proofs, see [97, 117] and monographs [20, 99]. 
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e In Section 5.7, the backward Kalman filter is introduced based on a backward 
Markov model for a stationary process, and relation among four different Markov 
models for a stationary process is briefly discussed. These Markov models will 
play important roles in stochastic realization problems to be studied in Chapters 
7 and 8. 


e Section 5.8 is devoted to a direct solution method of the ARE (5.75) due to 
[103, 124], in which numerical methods for the Kalman filter ARE of (5.70) are 
developed in terms of the solution of GEP. See also a monograph [125] in which 
various numerical algorithms arising in control area are included. 


5.10 Problems 


5.1 Suppose that the probability density function of (a, y) is given by the Gaussian 
density function 


(x,y) : 
x SS a —— 
tan 2n(1— p?)!/Pox0y 
1 | (ee)? _ 2o(e= te) (yy) 4 (yy)? 
2(1—p?) o2 Tx0y o? 


xe 


where 0,, 0, > 0 and |p| < 1. Show that the following relations hold: 
ox 
E{x|y} = He +p My), B{(@ — B{x | y})} = 020 - 6°) 
y 


5.2 Let K.(t) be the Kalman gain for the covariance matrices aQ(t), aS(t), aR(t), 
aP(0) with a > 0. By using (5.41a) and (5.42a), show that K,(t) is the same 
as k(t) of (5.41a). 

5.3 Define the state covariance matrix I7(t) = E{[x(t) — u.(t)\[x(t) — w2(t)]*}. 
Show that the following inequalities hold: 


Mt)>P@Q)>PE¢|H>0, Mt) > Ve) 
5.4 Consider an AR model 
y(t) = ayy(t —1) +--+ +any(t — n) + w(t) 


Then a state space model for y is given by 


0 0 An an 
1 An—1 An—-1 
a(t+1)= : . | at) + : w(t) (5.97a) 
1 at ay 


y(t) =[0---0 la(t) + w(t) (5.97b) 


138 


5.5 


5.6 


5.7 
5.8 
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Derive the Kalman filter for the above state space model, and show that the 
Kalman filter is the same as the state space model. (Hint: The state variables of 
(5.97a) are expressed as 


a(t) =any(t-—1), we(t) =a (¢-1)+an-1y¢—-1),---, 
In(t) = an1(t -—1) + ay(t—- 1) 
Thus we see that the state vector a(t) := (a1(t),-++ ,an(t))', t > n can be 
determined from y(k), & = t —1,---,t—n, so that we have P(t) = 0 for 
t > n, and hence /7(t) = S(t). It also follows from Q(t) = R(t) = q and 
S(t) = Bq that K(t) = B fort > n.) 
By using (5.44) with A(t) = A, C(t) = C, show that 


y(t) = e(t) + CK (t — le(t — 1) + CAK(t — 2)e(t — 2) 
+-».+ CA’! K(0)e(0) + CA‘&(0) 


and that 
y(0) Cc I e(0) 
y(1) CA CK(0) I e(1) 
~ f=]. fa@+ . . 
ae <1) cat! CA? K(0) --- CK(t-2) 1] Le(t—1) 


This is useful for giving a triangular factorization of the covariance matrix of the 
stacked output vector. 


In Section 5.6, we defined é = A — SR7!C and I! = 6PCT(CPC™ + R)“!. 
Show that 6 —- PC = A— KC holds. Also, derive (5.70) from (5.67). 


Derive (5.91) from (5.90). 
Consider the GEP of (5.92): 


N2= ies FEC". NET 


Let J = | ES - . Show that L.JL? = NJNT holds. By using this fact, show 


that if \ 4 0 is an eigenvalue of (5.92), so is 1/X. 


Part II 


Realization Theory 


6 


Realization of Deterministic Systems 


We introduce the basic idea of deterministic subspace identification methods for a 
discrete-time LTI system from the point of view of classical realization theory. We 
first present the realization algorithm due to Ho-Kalman [72] based on the SVD 
of the block Hankel matrix formed by impulse responses. Then we define a data 
matrix generated by the observed input-output data for the system, and explain the 
relation between the data matrix and the block Hankel matrix by means of zero- 
input responses. Based on the LQ decomposition of data matrices, we develop two 
subspace identification methods, i.e., the MOESP method [172] and N4SID method 
[164, 165]. Finally, we consider the effect of additive white noise on the SVD of a 
wide rectangular matrix. Some numerical results are included. 


6.1 Realization Problems 


Consider a discrete-time LTI system described by 


a(t+1) = Ax(t) + Bu(t) (6.1a) 
y(t) = Ca(t) + Du(t), t=0,1,--- (6.1b) 


where « € IR” is the state vector, u € IR” the control input, y € IR” the output 
vector, and A € R°*", B € R"*™, C € R°*", D © R°*™ are constant matrices. 
In the following, we assume that (A, B) is reachable and (C,, A) is observable; in 
this case, we say that (A, B,C) is minimal. 

From (6.1), the transfer matrix and the impulse response matrices of the system 
are respectively given by 


G(z) = D+C(zI— A)'B (6.2) 


D, t=0 
Gi = (6.3) 
CAu1B, t=1,2,--: 


and 
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where {G, t = 0, 1, --- } are also called the Markov parameters. We see that given 
(A, B, C, D), the transfer matrix and impulse response matrices can uniquely be 
determined by (6.2) and (6.3), respectively (see also Section 3.4). 

This chapter considers the inverse problems called realization problems [72]. 


Problem A Suppose that a sequence of impulse responses {G:, t = 0, 1,---}, or 
a transfer matrix G(z), of a discrete-time LTI system is given. The realization 
problem is to find the dimension n and the system matrices (A, B, C, D), up to 
similarity transforms. 

Problem B Suppose that input-output data {u(t), y(t), t = 0, 1,---, N —1} are 
given. The problem is to identify the dimension n and the system matrices 
(A, B, C, D), up to similarity transforms. This is exactly a subspace identi- 
fication problem for a deterministic LTI system. 


6.2 Ho-Kalman’s Method 


In this section, we present the realization method of Ho-Kalman based on the results 
stated in Section 3.9, providing a complete solution to Problem A. Let the impulse 
response of the system be given by (Go, G1, Go, --- ). Then, since D = Go, we 
must identify three matrices (A, B, C). 

Consider the input u that assumes non-zero values up to time t = —1 and zero 
fort =0,1,---, ie, 


u=(---,u(—3), u(—2), u(—1), 0, 0, 0, ---) (6.4) 


Applying this input to a discrete-time LTI system, we observe the output for ¢ = 
0,1,--+ as shown in Figure 6.1. For the input sequence of (6.4), the output is ex- 
pressed as 


-1 
y(t)= 5) Geiw(i), t=0,1,-- (6.5) 
This is a zero-input response with the initial state x(0), which is determined by the 
past inputs. It should be noted that the responses y(t) fort = —1, —2, --- are shown 
by dotted lines. 
We define the block Hankel operator with infinite dimension as 


pes Go GaGa | 

Go G3 Ga Gs --: 

H — | G3 Ga Gs Ge --: (6.6) 
G4 Gs Ge G7-::: 


Then the input-output relation is expressed as 
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Figure 6.1. Zero-input response of an LTI system 


y+ = Hu (6.7) 
where y+ and w_ are infinite dimensional vectors defined by 
y(0) u(—1) 
Y= y(1) ; Uu_= u(—2) 


Moreover, the observability and reachability operators are defined by 


C 


CA 
O=|c421, C€=[B AB A’B ..- | 


We present the basic theorem for the properties of block Hankel matrix, which 
plays an important role in the later developments. 


Theorem 6.1. (Properties of Hankel matrix) Suppose that (A, B, C) is minimal. 
Then, the following (i) ~ (iv) hold. 


(i) The block Hankel matrix H of (6.6) has finite rank if and only if the impulse 
response has a factorization like (6.3). 


(ii) The block Hankel matrix has rank n, i.e., rank(H) = n. Moreover, H has the 
factorization of the form 


T= 0G = OFT te. |T| 40 
(iii) Let the state vector at t = 0 be given by x(0) = Cu_. Then (6.7) is written as 
y+ = Ox(0) (6.8) 
(iv) The block Hankel matrix is shift invariant, i.e., 


Ht=ote=0A-.C=0-AC=0C* = HO 


where (-)* denotes the upward shift that removes the first block row, and (-)* 
the left shift that removes the first block column. 


144 6 Realization of Deterministic Systems 


Proof. Item (i) follows from Theorem 3.13, and items (ii), (iii), (iv) are obvious 
from the definition. See also [162]. 


Item (iv) has the following physical meaning. From (6.7), we see that Im(H) 
contains all the outputs after ¢ = 0 due to the past inputs up to t = —1. Define 


y(1) 
Then, we have 
yl = Htu_ (6.9) 


Hence, it follows from (6.9) that Im(H*) contains all possible outputs after ¢ = 1 due 
to the past inputs up to ¢ = —1. Since the system is time-invariant, this is equivalent 
to saying that Im(H™) contains the output after t = 0 due to the past input up to 
¢t = —2. Since the set of all inputs up to £ = —2 is a subspace of the space of all the 
past inputs, we see that all resulting outputs Im(H™) should be included in Im(#). 

The above properties of the block Hankel operator are extensively used for de- 
riving realization algorithms. In fact, the celebrated Ho-Kalman algorithm described 
below is entirely based on Theorem 6.1. 

For the actual computation using finite number of impulse response matrices, 
however, we must use the truncated block Hankel matrix of the form 


| Go G3 wae G ] 
Go G3 Ga ++ Gi 
ie= G3 Gye Gs +++ Gite € IRepx'm (6.10) 


Gre Gey Gayo +++ Greeti 


Also, the extended observability matrix O; and the extended reachability matrix C, 


are defined by 
sear 


CA 


where k and / should be greater than n'. Usually, we taken < k < 1. 
In the finite dimensional case, if rank(H, kL) =n, we see from Theorem 6.1 (ii) 
that 
Hire; =O, Fre), |T| 40 (6.11) 


'In practice, the dimension n is not known. Since it is impossible to find an upper bound 
of n by a finite procedure, it is necessary to assume an upper bound a priori. 
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where rank(0;,) = n, rank(@;) = n. Also, concerning the extended observability 
matrix, we have the following identity 


C CA 
CA CA? 
A= : => Of,-1A=O0z(pt1: kp,:) (6.12) 
CAs? Car} 


To get a unique least-squares solution of A from (6.12), we see that O,_1 should be 

full column rank, so that p(k — 1) > n. Thus, for a single output case (p = 1), the 

relation turns out to be k > n + 1, so that k should be strictly greater than n. 
Similarly, from the extended reachability matrix, we have 


ACi-1 = Ci(:,m +1: 1m) (6.13) 


Lemma 6.1. (Deterministic realization algorithm [72, 184]) 


Step 1: Compute the SVD of Hy, as 
Ay =[Us Un] 50) [V5 =U,5,V2 (6.14) 
zs s n 0 0 Ve s—HsVes 
where 3), is a diagonal matrix with the first n non-zero singular values of H1, so 
that we have 


O01 > 02 23+ > On > 0 = On41 = On42 
Step 2: Compute the extended observability and reachability matrices as 
O, =U, 51/7, eee PT i eyt (6.15) 


where T € R”*” is an arbitrary nonsingular matrix. 


Step 3: Compute the matrices A, B, C' as 


A=0!_,0,, B=C)(1:n,1:m), C=O,(1:p,1:n) (6.16) 


where Ox = Oxn(p +1: kp, 1: n) (= Of). 


For computation of A, the identity of (6.12) is used. It follows from (6.13) that 
the matrix A in Step 3 is also given by 


AS CPOs (6.17) 


Example 6.1. Suppose that for the impulse input u = (1, 0, 0,---), we observe the 
outputs 
y = (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ---) 


This output sequence is the well-known Fibonacci sequence generated by 
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Gt = Gi-1 + Gi-a, t=2,3,-:: 


with the initial conditions Go = 0, Gi = 1. A realization of this impulse response 
sequence is given, e.g., by [143] 


A=[ta], care C=[10, D=0 (6.18) 


and the transfer function is given by 


z 
G(z) = =———— 6.19 
=> 6.19) 
Now we use the algorithm of Lemma 6.1 to compute a realization. Recall that 

it is necessary to take the number of rows & should be greater than n. Thus, taking 


k = 1 =5, we have the following Hankel matrix 


i ia. 3 5 
(2-35 8 
Heese 2 35-8: 13)/eRr™ 
4-5 8 18.91 
5 8 13 21 34 


By the SVD of Hs 5, we get 
0, = 04.5601, 2 = 0.4399, o,=0, 123 
so that we have n = 2. By putting T = I;, 


ee 1.6179 cae B=| 0.8550 


~ | 0.0185 —0.6179 Jnsiea a mea! 


where the transfer function G(z) = (A, B, C) is also given by (6.19). 


We see that the matrices obtained above satisfy an interesting property that 
A = A™ and B = C’. If we use an asymmetric Hankel matrix, say H4,¢, then 
this property does not hold; however, the correct transfer function is obtained as long 
ask, 1 > 2. 


Lemma 6.2. Suppose that H;,1 is symmetric in Lemma 6.1 and thatT = I, in Step 
2. If all the elements of ©: are different, i.e, 0, > 02 > +++ > On, there exists a 
matrix S = diag(+1, --- , +1) such that A= SA'S andC = B'S hold. 


Proof. The fact that H;,; is symmetric implies that k = 1,m = pandG? = Gi, t= 
1, 2,---. Let H := Ay, in (6.14). Since H’ = H, it follows that H =USVT = 
VSU?, so that Im(H) = Im(U) = Im(V). Thus there exists a nonsingular matrix 
S € R"*” such that U = VS. Since I, = UTU = STV'VS = STS, we see that 
S=V1tU € R"*” is an orthogonal matrix. Let 


$= a" “| ¢ See ROI) (6.20) 
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where a, b € R"~! andc € R. Comparing the (2, 2)-block elements of the identities 
STS =I, and SST = I, we see from (6.20) that 


lla? +e = lo? +e =1 => flall = [ld 


Also, USV? =V SUT implies SYST = SY, so that from (6.20), 


ei. hia ai Sn—10 = aon 


From the second relation in the above equation, 


Qa) 
a2 
a= —S},_-1b = : b 
On . 
An-1 
where a; = 0;/0n, i= 1, +--+ ,n —1. Since |{a|| = ||b]|, we have 
apt +ae_, =albit-:- +02), =o +--+ 0, 
so that (a7 —1)b2 +++ -+(a?_, —1)b?_, = 0. But, sincea; > 1,i=1,---,n—1, 
we have b; = 0,1 = 1, --» , nm — 1, implying that a = 6 = 0 and c? = 1. By means 
‘ — | Sn—1 0 

of these relations, we have S = 0 +1 and 


Sete sy = Sy-12n-1, eee — In-1; on Sn-1 = In-1 


n—1 


Applying the above procedure to S,_1 € R(°—!)*("—1) , we can inductively prove 
that S = diag(+1, --- , +1), a signature matrix. 
Since T' = [,, in (6.15), it follows that 


Oeste. Ceaser 
Also, S and ¥'/? are diagonal, so that SU!/? = ¥'/?5. Thus, by using V = US, 
C= SAT = est = S54) = S07 


Hence we get 
B= C6129) S80; 2p.) Sse" 


and also ' 
5 = S(O)", Ch, = (Of-1) S 


Thus from (6.13), we have 


A= G@ Cl =S(0) (ls) SSS) cor ea sane 


as was to be proved. 
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It should be noted that (A, B, C) derived in Lemma 6.2 is not balanced (see 
Problem 6.2). 


Example 6.2. Consider a scalar transfer function 


2 
12° + Poz t+ fz = = 
G2) = SE nytt gett 
2° + Q12° +022 +03 
where it is assumed that the transfer function is coprime and that the dimension is 
a priori known (n = 3). Suppose that we are given an impulse response sequence 
(91, Ga. °° 


-). Since rank(H) = 3, there exists a vector € = [a3 a2 a1 1] such that 


as 
91 92 93 94 


0 
a 
Hs. = | 92 93 9495 | |g. 0 
93 9495 96} | 4 0 
so that € € Ker(H3,4). Let the SVD of H3.4 be given by 
T 
1, 0 00 vt 
A34=U 0 a 0 0 wt . 01 > 02 >03>0 
0 0 030 oe 
4 


Hence we have 
FT3 44 =0 > we Ker(H3,4) 
Since both € and v4 belong to the one-dimensional subspace Ker(H3 4), we get € by 
normalizing v4 so that v4(4) = 1. 


In view of (6.11), we have 


Cc 
Al3 4 — cA 


[b Ab Ab A%b] 
cA? 


Since (c, A) is observable, it follows from H3 4€ = 0 that 
(A? + aA? + ag2A + a3I)b = 0 
Pre-multiplying the above equation by A and A”, and re-arranging the terms yield 
(A? +a, A? +. a2A +431)[b Ab Ab] =0 
By the reachability of (A, 6), we have det[b Ab A?b] 4 0, and hence 
A? +a,A? + aA +a3I =0 


Since G(z) is coprime, the characteristic polynomial of A coincides with the de- 
nominator polynomial of G(z), so that a; = a;,i = 1, 2, 3. Moreover, from the 
identity 
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Pee elles 
eigheee 


we get the coefficients of the numerator polynomial. 


We see from above examples that state space models and transfer functions of 
LTI systems can be obtained by utilizing the SVD of block Hankel matrices formed 
by the impulse response matrices. 


6.3 Data Matrices 


Consider an discrete-time LTI system, for which we assume that the system is at rest 
fort < 0, i.e, u(t) = 0, y(t) = 0, t = —1, —2, ---. Suppose that the input-output 
data u = (u(0), u(1), --- u(N —1)) andy = (y(0), y(1), --- y(N — 1)) are given, 
where N is sufficiently large. Then, for k > 0, we get 


0 0 u(0) u(N —k) 
Si ppegienage u(1) «+. uN -—k+1) 
OF ok : : 
u(0) u(1) ++: u(k-1)--- u(N—-1) 


Suppose that the wide matrix in the right-hand side formed by the inputs has full 
rank. Then, the impulse responses (gz—-1, +++ , 91, go) can be obtained by solving 
the above equation by the least-squares method’. This indicates that under certain 
assumptions, we can compute a minimal realization of an LTI system by using an 
input-output data, without using impulse responses. 

Suppose that the inputs and outputs 


(u(O) u(1) ++: u(k+N —2)) 
and 


(y(0) yl) ++: y(k+ N —2)) 


are given, where &; is strictly greater than n, the dimension of state vector. For these 
data, we form block Hankel matrices 


u(0) u(1)--- u(N—-1) 
u(1) u(2)--- u(N) 


Uo|n—1 = E Remxn 


healt NeD 


*For the exact computation of impulse responses from the input-output data, see (6.67) in 
Section 6.6. 
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and 


Your-1 = € RePXN 


y(k — 1) y(k) «++ y(k-+.N —2) 


where the indices 0 and & — 1 denote the arguments of the upper-left and lower-left 
element, respectively, and the number of columns of block Hankel matrices is usually 
fixed as N, which is sufficiently large. 

We now derive matrix input-output equations, which play a fundamental role in 
subspace identification. By the repeated use of (6.1), we get? 


y(t) C D u(t) 
y(t +1) CA CB D u(t + 1) 
: a : a(t) + : : ; 
y(t +k—-1) cA‘! CA’°B.-. CB D} Lu(t+k—1) 


For notational simplicity, we define 


ee ae 

t ult 

y(t) = ut ° ) € Re? | uz(t) = 7 ) en 
y(t + k—1) ult +k—1) 


Then we have 
yn(t) = Op,x(t) + D,ur(t), #¢=0,1,-:- (6.21) 


We see that in terms of w,(t) and y(t), the block Hankel matrices Up),—1 and 
Yo|~—1 are expressed as 


Uo|r—1 = [we (0) ug(1) ++» un(N — 1) 


and 
Yoje—1 = [ye (0) ya(1) «+> ya(N — 1] 
It thus follows from (6.21) that 


>This type of equations have been employed in state space identification problems in ear- 
lier papers [62, 155]; see also Problem 5.5. 
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Yojr—1 = OnrX0 + We Uojn—1 (6.22) 


where Xo = [a(0) (1) «++ 2(N — 1)] € R"*% is the initial state matrix. 
Similarly, we define 


Ugjor—1 = [wa(k) un(k +1) ++ ug(N +k — 1)] 
Yajoe—1 = [ye(k) yal +1) +++ y(N +k — 1) 
Then, using (6.21) fort =k, k+1,--.,&+ N —1, we get 
Ypjon—1 = OnXe + We U Rjon—1 (6.23) 


where X, = [x(k) a(k +1) --: e(kK+N—1)] ER. 

Equations (6.22) and (6.23) are the matrix input-output equations with initial 
states Xo and X;, respectively. The block Hankel matrices Ug),_; and Yoj,—1 are 
usually called the past inputs and outputs, respectively, whereas the block Hankel 
matrices Ujjo,—1 and Y;)24—1 are called the future inputs and outputs, respectively. 

We assume that the following conditions are satisfied for the exogenous input 
and the initial state matrix. 


Assumption 6.1. A/) rank (Xo) =n. 
A2) rank (Uoj,-1) = km, where k > n. 


A3) span(Xo) N span (Uojz-1) = {0}, where span(-) denotes the space 
spanned by the row vectors of a matrix. 


Assumption 6.1 Al) implies that the state vector is sufficiently excited, or the 
system is reachable. Indeed, if Al) is not satisfied, there exists a non-zero vector 
n € R” such that 7'Xq = 0, which implies that Xg € R”*% does not span the 
n-dimensional state space. Assumption 6.1 A2) shows that the input sequence u € 
IR” should satisfy the persistently exciting (PE) condition of order k. For the PE 
condition, see Definition B.1 of Appendix B for more details. Also, A3) means that 
the row vectors of Xo and Ugj,—1 are linearly independent, or there is no linear 
feedback from the states to the inputs. This implies that the input-output data are 
obtained from an open-loop experiment. 


Lemma 6.3. [118,119] Suppose that Al) ~ A3) and rank(O,) = n are satisfied. 
Then, the following rank condition holds: 


rank ye | =km+n (6.24) 
O|k—1 
Proof. [86] It follows from (6.22) that 
| = ce | Uojr—-1 (6.25) 
Yo|k—-1 We Ox Xo ; 


where &; > n. From Assumption 6.1, we see that the two block matrices in the right- 
hand side of the above equation have rank km + n. This proves (6.24). 
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Lemma 6.3 implies that for the LTI system (6.1), if we delete row vectors in 
Yo|x—1 that are dependent on the row vectors in Uo|,—1, there remain exactly n inde- 
pendent row vectors in Yo),_1, where n is the dimension of the state space. 

In the following, the matrix 


k>n 


is referred to as data matrix. In order to study the relation between the block Hankel 
matrix H;,, formed by the impulse responses and the data matrix Wo),_; defined 
above, we begin with a simple example. 


Example 6.3. Suppose that y(t) = 0, t < 0. Let u = (0, 0, 0, 1, 0, 0, ---) be 
the unit impulse at t = 3. We apply the input wu to an LTI system, and observe the 
impulse response with three steps delay 


y= (0, 0, 0, Go, G1, G2, 93; 2) 


Let k = 4, N = 8. Then, we have 


0 go 91 | 92 93 94 95 
Go 91 92 | 93 94 95 96 
Jo 91 92 93 | 94 95 G6 97 


0001/0000 
0010/0000 
0100/0000 

Us] _| 1000/0000 

a ~ | 0.0 0 go} gi 92 93 94 wo) 
0 
0 


This data matrix has a particular block structure, in which the upper-right block is 
a zero matrix, and the lower-right block is exactly the Hankel matrix H4,4. Also, if 


a0 , where J, is a permutation 

0 I, 

matrix with 1 along the principal anti-diagonal [see (2.39)], then the right-hand side 
I,|} 0 

of (6.26) has the form LF wey 

upper-right block zero appearing in the right-hand side of (6.25). 


we post-multiply (6.26) by a nonsingular matrix | ef 


, which is similar to the block matrix with the 


Data matrices formed by generic input-output data do not have a nice structure 
like (6.26). However, by exploiting the linearity of the system, we can transform the 
data matrices into block matrices with zeros in the upper-right block. This fact is 
indeed guaranteed by the following lemma. 


Lemma 6.4. [781] Suppose that the input-output data 


k>n (6.27) 
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are given. Then, under the assumption of Lemma 6.3, any input-output pair 


2) de 
a(0)=| . |, ie) =] 
ig =A} aA) 


of length k can be expressed as a linear combination of the column vectors of 
Wojr-1- In other words, there exists a vector ¢ € RY such that 


bs 0) 7 tiga ¢ (6.28) 


Proof. Post-multiplying (6.22) by a vector ¢ € R% yields 
Yojr—16 = OnX06 + WeU oyn—1 6 


Thus it should be noted that t;(0) := Upjn—1¢ and yx (0) := Yojx—1¢ are an input- 
output pair with the initial state vector (0) := Xo¢. This is a version of the well- 
known principle of superposition for an LTI system. 

To prove the lemma, let (&, (0), g(0)) be an input-output pair. Then, it follows 
from (6.21) that there exists an initial state (0) € R” such that 


Gu(O) = Ont(0) + Vette (0) (6.29) 
By assumption, ies | € R(km+n)xN has full row rank, so that there exists a 
0 
a4 (0)] — [| Uoje—1 
: = ¢ 
Thus from (6.29) and (6.25), we have 
all = ie rie ead 


vector ¢ € IR% such that 


Yr (0) DV, Oz (0) 
= Tem Okmxn raul es ie! 
i. Ok | | Xo ¢ Yo|k—1 ¢ 


as was to be proved. 


The above lemma ensures that any input-output pair can be generated by using a 
sufficiently long input-output data, if the input has a certain PE condition. 


Example 6.4. Consider a scalar system described by 
y(t) =ay(t-1+u(t), t=0,1,---; y(-l)=0 


Then, the output is expressed as 
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y(t) = a’u(0) +a’ *u(1) +--+ + au(t — 1) + u(t), t=0,1,--: 


Suppose that we have the following set of input-output data. 


u 
1 yo = 1 
2 y=at+2 
1 y=a?+2at+1 
1 y3=a2+2a?+a-1 
1 ya —a*+2a2+a?-a-1 
1 ys =a® +2a* +a? —a*-atl 
1 ye =a’ +2a°+a*-a?-a?+a+1 
-1 y=a'+2a°&+a—-—at—-a+a?+a-1 


NOoKRWNH OG 
| 


Let k = 3, N = 6. Then, the data matrix is given by 


1 2 1/-1-1 1 

2 1-1/-1 1 1 
Hae Deh) Sth 
Yoo | 


(6.30) 


Yo Y1 Y2| Y¥3 Ya Y5 
Y1 Y2 Y3| Y4 U5 Ye 
Y2 Y¥3 Y4) ¥5 Yo Y7 


We observe that three vectors in the upper-left 3 x 3 block are linearly independent. 
By applying the column operation using these three column vectors, we make the 
upper-right 3 x 3 block a zero matrix. By this column operation, the lower-right 
block is also affected by the column vectors in the lower-left block, so that 


12 1; 0 O O 
2 1-1; 0 O QO 


Uo]. heen 0. 8 
= 6.31 
a 631) 


Yo Y¥1 Y2 Ys YA U5 
Yi Y2 Y3} ay, ay, ays 
Y2 Y3 Y4|Q Y2z AY, A'Ys 


In fact, by taking ¢ = (—2/3 4/3 —1 1 0 0)? ER’, it follows that 


1 2 1/-1-1 1] [-# 

Oe US hlled A of 4 
1 eA ee | 0 

or 

0 

0 


Yo Y1 Y2| Y¥3 Ya U5 
Y1 Y2 Y3) Y4 Ys Ve 
y2 Y3 Y4| UB Ye Y7 


where y = a® +a? + a/3. We see that y is the output at t = 3 due to the input 
u = (1 1 1/3 0), so that (y, ay ay) is a zero-input response with the initial 
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condition (0) = y4. Similarly, we can show that y, = a*t + 2a? + 2a? + a and 
ys = a +2a* +a? + 2a/3, so that y/, and yf are the outputs att = 4 and ¢t = 5 with 
inputs u = (1 2 2 1 0) andu = (1 2 1 0 2/3 0), respectively. It should be noted 
that these inputs are fictitious inputs, which are not actually fed into the system. 

We can write the lower-right block of (6.31) as 


1 C 
a} [ys ya Ys] = 1 CA | [ys ya ys] = Oslys ya Ys] 
[a | Loa | 


Clearly, the image of the above matrix is equal to the image of the extended observ- 
ability matrix 03 € R?™!. Thus, we have C = 1, A =a. 


We need a column operation to make the upper-right block of the data matrix a 
zero matrix as shown in (6.31). However, this is easily performed by means of the 
LQ decomposition, which is the dual of the QR decomposition. 


6.4 LQ Decomposition 


We usually consider rectangular data matrices with a large number of columns. Thus 
if we apply the LQ decomposition to rectangular matrices, then we get block lower 
triangular matrices with a zero block at the upper-right corner. 

Let the LQ decomposition of a data matrix be given by 


vanes 7 Len ta) [2] , 
ee Ly Lo | |Q2 ae 


where £1, € RE™**™) Do, € ReEXk™ Toy © REPX*P with £11, Loo lower trian- 
gular, and Q,; € RY**™, Qo € R**? are orthogonal. The actual computation of 
LQ decomposition is performed by taking the transpose of the QR decomposition of 
the tall matrix 

[Voie Yore—1 € RN*Hnt2) 


A MATLAB® program for the LQ decomposition is displayed in Table 6.1. 


Example 6.5. Let a = 0.9 in Example 6.4. Then, from (6.30) and (6.31), it follows 
that 


on = = 1 
as me el |e 1 1 
U2] _|_1 -1 -1 1 1 -1 eas 
Yop I 29 361 | 2249 1.0241 1.92169 


2.9 3.61 2.249) 1.0241 1.92169 2.729521 
3.61 2.249 1.0241)1.92169 2.729521 1.4565689 


and 
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Table 6.1. LQ decomposition 


% Program 

% LQ decomposition 

function [L11,L21,L22]=Iq(U,Y) 
km=size(U, 1); kp=size(Y, 1); 
[Q,L]=qr([U;Y7,0); 

Q=Q’; L=L; 

L11=L(1:km,1:km); 
L21=L(km+1:km+kp,1:km); 
L22=L(km+1:km+kp,km+1:km+kp); 


1 
2 

Uo]. 1 

i - 1 2.9 3.61] 1.839 4.6341 4.17069 Co) 
2.9 3.61 2.249] 1.6551 4.17069 3.753621 
3.61 2.249 1.0241]1.48959 3.753621 3.3782589 

Also, the LQ decomposition of (6.33) gives 
—3.0000 0 
—1.3333 2.6874 0 
Lea 1.6667 1.1990 —1.3359 0 (6.35) 


3.0159 —0.7588 —1.3353 |—4.5569 
—4.0509 2.0045 —1.2017 |—4.1012 
—1.9792 3.0030 —2.4175 |-3.6911 


oooqooo 


We see that in (6.34) and (6.35), multiplying the first row of the lower-right block 
by 0.9 yields the second row, and multiplying the second row by 0.9 yields the third 
row, so that the rank of these matrices is one, which is the same as the dimension of 
the system treated in Example 6.4. 


From (6.32), we obtain 


iaral Lage 


re re a | [Q1 Q2 | (6.36) 


Yo|e—1 


The following lemma provides a system theoretic meaning of the [-matrix in terms 
of zero-input responses of the system. 


Lemma 6.5. Under Assumption 6.1, each column of the L-matrix is an input-output 
pair; in particular, each column of L22 contains a zero-input response of the system. 
Moreover, we have rank(L22) = n, i.e. the dimension of the system. 

Proof. Since Q;, Q2 are formed by N-dimensional column vectors, it follows from 
(6.28) of Lemma 6.4 that each column of L-matrix of (6.36) is an input-output pair. 
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Since [12 = 0, we see that L22 consists of zero-input responses. Recall from (6.8) 
and (6.21) that the zero-input response is expressed as y;,(0) = 0;,2(0). We see that 
the number of independent zero-input responses is n = dim (0), so that we have 
the desired result. 


The scenario of the realization procedure based on the LQ decomposition is to 
compute the SVD of £22 in order to recover the information about the extended 
observability matrix, and then to estimate the matrices A and C’ by using the relation 
of (6.12). On the other hand, the information about matrices B and D is included in 
the matrices £4, and L2, of (6.32). To retrieve this information, however, the matrix 
input-output equation of (6.22) [and/or (6.23)] should be employed together with 
Ly, and [1, as explained in the next section. 

Thus, in the next section, we shall present a solution to Problem B, stated in Sec- 
tion 6.1, based on a subspace identification method, called the MOESP method, in 
which the LQ decomposition technique and the SVD are employed. Another solution 
to Problem B is provided by the N4SID subspace identification method, which will 
be discussed in Section 6.6. 


6.5 MOESP Method 


In this section, we discuss the basic subspace identification method called MOESP 
method? due to Verhaegen and Dewilde [172, 173]. In the following, the orthogonal 
projection is expressed as E'{- | -}. 

We see from (6.32) that 


Uoje—-1 = LuiQr (6.37a) 
Yor—1 = LaQ{ + L2Qz (6.37b) 


where Ly, € R'™**™, To. © IRKPX*P are lower triangular, and Q; € RY**™, 
Q2€ RY <*P are orthogonal. Under Assumption 6.1, we see that £1, is nonsingular, 
so that Qt = be Up|x—1- Thus, it follows that (6.37b) is written as 


Yor-1 = La Ly Uojr—1 + Ly»Q3 


Since Q1, Q»2 are orthogonal, the first term in the right-hand side of the above 
equation is spanned by the row vectors in Up|,—1, and the second term is orthogonal 
to it. Hence, the orthogonal projection of the row space of Yo;,—1 onto the row space 
of Up|x—1 is given by 


E{Youe—t | Uoje—1} = LQ? = La Ly Voe-1 


Also, the orthogonal projection of the row space of Yo|,_1 onto the complement 
Usie—1 of the row space of Ug|x_1 is given by 


“MOESP=Multivariable Output Error State Space 
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E ae T 
ELYoln—1 | Voje—i} = L22Q3 


In summary, the right-hand side of Yo),~1 in (6.37b) is the orthogonal sum decom- 
position of the output matrix Yo),_; onto the row space of the input matrix Ug),_1 
and its complement. 

Also, it follows from (6.22) and (6.37b) that 


OnXo0 + VeliQ) = LaQ) + Lx2Qs (6.38) 


where it should be noted that though the right-hand side is an orthogonal sum, the 

left-hand side is a direct sum, so that two quantities therein are not necessarily or- 

thogonal. This implies that 0, Xo 4 Loo.Qs and%Li,Qt A LaQt. 
Post-multiplying (6.38) by Qe yields 


OnX0Q2 = Lo 


where Q?Q2 = 0, QF Q2 = Ip are used. Under the assumptions of Lemma 6.3, the 
product XQ» has full row rank n and rank(O,) =n, which is equal to rank(L2). 
Thus we can obtain the image of the extended observability matrix ©; and hence the 
dimension n from the SVD of Lo. € R*P**P , 

Let the SVD of L22 be given by 


Ins = [U1 U3] fe | BE 


oa vr | =U sAVe (6.39) 
where U, € R*?*” and Uy € R*?*(*?—”) _ Then, we have 
On X0Q2 = WEY," 
so that we define the extended observability matrix as 
Op] Us" (6.40) 
and n = dim 2. The matrix C’ is readily given by 
C = Og(1: p,1:n) (6.41) 
and A is obtained by solving the linear equation (see Lemma 6.1) 
O,(1: p(k —1),1:n)A = Op(p +1: kp,1:n) (6.42) 


Now we consider the estimation of matrices B and D. Since Us Lg. = 0 and 
US Ox = 0, pre-multiplying (6.38) by US € RP—") xk yields 


Up OL11Q) = Us LuQt 


Further post-multiplying this equation by Q, yields 
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| D 0 HO) ] 
CB De weeth 
us ; . «| Sie} (6.43) 
CA*-2B CA*-3B.-- D 


This is a linear equation with respect to B and D, so that we can use the least-squares 
method to find them. In fact, define 


us = [Ly Lo: Lei, Us isk — [My Mo ++: Mz] 
where £; € R¢P-™)*P 4 =1,---, kand M; € R4P-”™)*™, Thus, from (6.43), 


LiD+hLoCBt+:--+Ly,_1CA*?B+4,CA'?B=™M, 
LoD PLC Riel CA Ba WG 


LprrD+h,CB=Mep-1 


£L,D=My, 
Defining £; = [L; «+» Lg] € R&P“) x(t 1-4 | = 2... , k, we get the follow- 
ing overdetermined linear equations: 
Ly L2On-1 My 
Ly L3O0p_-2 Mo 
: : = : 6.44 
male sos 
LQp-1 LO, Mr-1 
Ly 0 Mz 


where the block coefficient matrix in the left-hand side is k(kp — n) x (p +n) - 
dimensional. To obtain a unique least-squares solution (D, B) of (6.44), the block 
matrix has full column rank, so that k(kp — n) > (p + n) should be satisfied. It can 
be shown that if & > n, this condition is satisfied. 

Summarizing the above, we can provide a subspace identification method that 
solves Problem B. Suppose that we have the input and output data Upj,—1 and Yo\x_-1- 
Then, we have the following lemma. 


Lemma 6.6. (MOESP algorithm) 
Step 1: Compute the LQ decomposition of (6.32). 


Step 2: Compute the SVD of (6.39), and letn := dim 31, and define the extended 
observability matrix as 
Ope te 
Step 3: Obtain C and A from (6.41) and (6.42), respectively. 


Step 4: Solve (6.44) by the least-squares method to estimate B and D. 
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In [172, 173], this algorithm is called the ordinary MOESP, and a program of the 
above algorithm is given in Table D.2 of Appendix D. 


Example 6.6. Consider a simple 2nd-order system with 


0.6 0.4 0 
A=|_99 all ceapile C=[l 05], D=0 


The transfer function is then given by 


0.52+01  — boz? +biztbe 


oe a a ee 
(2) z2 — 1.22 +0.52 2taztae 


We have performed simulation studies by using 100 sets of input-output data with the 
length N = 100, where the input is a white noise with mean zero and unit variance, 
and then a white noise v is added to the output y so that the S/N ratio in the output 
becomes approximately a7 /o; ~ 100. 

By using the MOESP algorithm of Lemma 6.6 with k = 8, we have identified 
the dimension n and parameters of the transfer function. The means and standard 
deviations of the first five singular values of [22 are displayed in Table 6.2, where 
s.d. denotes the standard deviation. 


Table 6.2. Singular values of L22 


O1 02 03 O04 05 
mean 15.4313 5.7956 1.1010 1.0354 0.9664 
s.d. 1.8690 0.5360 0.1004 0.0887 0.0790 


The singular values o;, 1 = 3, 4, --+ are relatively small compared with the first 
two 01 and 02, so that the dimension is correctly identified as n = 2. It should be 
noted that for the noise free case (v = 0), we observed that 0;,2 = 3,4, +--+ are 


nearly zero (order of 10—!*). Also, the identification result of the transfer function 


Table 6.3. Simulation result 


ay a2 bo bi be 
True —1.2000 0.5200 0.0000 0.5000 0.1000 
mean —1.1999 0.5204 —0.0016 0.5019 0.1002 
s.d. 0.0147 0.0110 0.0102 0.0156 0.0185 


is displayed in Table 6.3. Thus we see that for this simple system, the identification 
result is quite satisfactory. 
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6.6 N4SID Method 


In this section, we show another method of solving Problem B. This is fulfilled by in- 
troducing the basic idea of the subspace identification method called N4SID method? 
developed by Van Overschee and De Moor [164, 165]. Prior to describing the method, 
we briefly review the oblique projection in subspaces (see Section 2.5). 

Let A, B, C be the row spaces generated by the row vectors of matrices A, B,C, 
respectively. We assume that BM € = {0}, which corresponds to Condition A3) of 
Assumption 6.1. For a € A, we have the following decomposition 


E{a| BV C} = Bye{a| B} + Eya{a | C} (6.45) 


where the left-hand side is the orthogonal projection, while the right-hand side is a 
direct sum decomposition; E\je{a | B} is the oblique projection of a onto B along 
€, and Byala | C} is the oblique projection of a onto € along B. 

Let k > n be the present time. Define U, := Uo|k—1» Yp := Yojr—1> Xp := Xo 
and Ur := Ugjox—1, Vp := Yejon—1, Xf := Xx, where the subscripts p and f denote 
the past and future, respectively. In order to explain the N4SID method, we recall 
two matrix input-output equations derived in Section 6.3, i.e., 


Yp = O,Xp + YU, (6.46) 
Ye = OpXp¢ + VU yf (6.47) 


Further, define W,, Wy € RM(™+?)*N ag 


U, Volr— U Ugiop— 
Wie Pe O|k—1 Wee fl _ | Geax 
¥5 Youk—1 Y Yejor—1 


The following lemma explains a role of the state vector for an LTI system. 


Lemma 6.7. [41,118] Suppose that rank(O;,) = rank(C,) = n with k > n. Under 
Al) ~A3) of Assumption 6.1 with k replaced by 2k, the following relation holds. 


span (X;) = span(W,) NM span (Wy) (6.48) 
Proof. First we show that rank(X ) = n. It follows from (6.1a) that 


a(k +i) = A*®a(i)+[A* 1B A*-?B --- BY ss : ?) 
ula ay —1) 


so that 7 
X; = A*X, + CxUp (6.49) 


°N4SID= Numerical Algorithms for Subspace State Space System Identification. 
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where €, = [A*-!B A*~?B ... B] is the reversed extended reachability matrix. 
Since rank(€,U,) = n and span(X,) N span(U,) = {0} by Assumption 6.1, we 
see from (6.49) that rank(X;/) > n. But, by definition, rank(X;) <n, so that we 
have rank(X ;) = n. Moreover, from (6.46) and (6.47), 


X, = OLY, — O1,U, € span (W,) (6.50) 
Xp = OLY; — OLW,.U;y € span (Wy) (6.51) 


where Ol is the pseudo-inverse of O;. Thus, from (6.50) and (6.49), span (Xp) C 
span (W,,). It therefore follows from (6.51) that 


span (X;) C span (W,) M span (W,) 


We show that the dimension of the space in the right-hand side of the above 
relation is equal to n. From Lemma 6.3, we have dim(W,) = dim(W;) = km+n 
and dim (W, V Wy) = 2km +n, where dim(-) denotes the dimension of the row 
space. On the other hand, for dimensions of subspaces, the following identity holds: 


dim(W, 9 Wy) = dim(W,) + dim(W;) — dim(W, V Wy) 


Thus we have dim(W, M Wy) =n. This completes the proof. 


Since W, and Wy are the past and future data matrices, respectively, Lemma 6.7 
means that the state vector Xr is a basis of the intersection of the past and future 
subspaces. Hence, we observe that the state vector plays a role of memory for ex- 
changing information between the past and the future, where the state vector Xf can 


be computed by the SVD of ei € R2*@+™) XN: cee [118]. 
f 
Consider the LQ decomposition 

Us Iy;, 0 0 0 i 
U, Lo, Lo22 0 0 a 

= 6.52 
Yp L31 L32 L33 0 HS 7) 
Ye D4, L4g Laz Lag Qt 


where Ly1, D2. € RE™**™, 33, L44 € R*?**? are lower triangular, and where 
Q1,Q2 € RY**™, Q3,Q4 € R‘*"? are orthogonal. Then, we have the following 
theorem, which is an extended version of Lemma 6.5. 


Theorem 6.2. Suppose that Al) ~ A3) of Assumption 6.1 hold with k replaced by 
2k, so that the PE condition of order 2k holds. Then, for the LQ decomposition of 
(6.52), we have 


rank(Z42) =n, rank(L43)=n, rank | =n, rank[La2 L43] =n 
43 


Moreover, it follows that L44 = 0 and hence rank(L33) =n. 
Proof. Recall from Lemma 6.5 that each column vector in the Z-matrix of (6.52) 
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is an input-output pair, though the blocks Uy and U,, are interchanged. Also, we see 
that three block matrices [12, £13, £14 corresponding to future inputs are zero. This 
implies that each column of the last three block columns of the D-matrix is an input- 


eke : : L 
output pair with zero future inputs. In particular, columns of D42, L43, | Ea | » L44, 


[L42 [43] consist of zero-input responses. Since the number of independent zero- 
input responses equals the dimension of the system, we have all rank conditions 
stated in this theorem. Also, we see that D4, = O, since past inputs and outputs 
together with the future inputs generating it are zero (L14 = 0, Le4 = 0, D34 = 0). 

From (6.24) with k := 2k, the rank of the left-hand side of (6.52) is 2km + n, so 
is the rank of the L-matrix. Since rank(£11) = rank(Lo2) = km and D4, = 0, we 
see that rank(L33) = n. This completes the proof. 


We are now in a position to present a theorem that provides a basis of the N4SID 
method. 


span{Us} 


W,Uz 
span{W, } 


= OnXy 
Figure 6.2. Oblique projection 


Theorem 6.3. [165] Suppose that Al) ~ A3) of Assumption 6.1 hold with k re- 
placed by 2k. Let the oblique projection of Y onto W, along Uy be given by 


€ = By, {Y; | Wp} (6.53) 
(see Figure 6.2). Also, let the SVD of € be given by 


aly 
€=[U1 U2] Bs 4 Ly =usVve (6.54) 


Then, we have the following results. 


n=dimS, (6.55) 
& = OnXy € RVPXN (6.56) 
O, =U, 5,77 € Rx", |T| £0 (6.57) 


aT yey ere (6.58) 
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Proof. Since £44 = 0, the future Y is completely determined by the past W, and 
the future inputs Uy. Thus, Yr = Yy in Figure 6.2°. 
For convenience, we rewrite (6.52) as 


‘yyT 
iv.) =] mi tao] [a 3) 
3 


Ly] [rn reso] | 


From Theorem 6.2, we see that rank( R22) = rank ie 


D32 133 
less than k(m +p), so that Roo is rank deficient. Also, it follows from rank(L22) = 
km and the third condition in Theorem 6.2 that (see Problem 6.6) 


| = km +n, which is 


Ker(Ro2) C Ker(R32) (6.60) 
Now from (6.59), we have 
Ro20Q3 = Wy — RQ 
Thus, there exists a = € Re(p+™)xN such that 
Q2 = Ri, (W, — RuQT) + nam) — Rb, Ro2J (6.61) 


where Ri, is the pseudo-inverse defined in Lemma 2.10; see also Lemma 2.11. 
From the third relation of (6.59), we have Yr = R31 Qi + R32.QF, so that by 
using (6.61) and QT = Ry Uy,, 


Yp = (Rai — Rao Rh, Roi) Ry Uy; + Roo ki,W, 
+ R32[Ik(ptm) — Rhy Roa] (6.62) 


But, from (6.60), R32 [In(pm) — R},R22] = 0, since IT := Iy~p4m) — Rb. Reo is 
the orthogonal projection onto Ker( R22). Thus, (6.62) reduces to 


Yp = (Rai — Rao Rh, Ror) Ri U; + Roo Ri,W, (6.63) 


where span(U;) M span(W,) = {0} from A3) of Assumption 6.1. It thus follows 
that the right-hand side of (6.63) is a direct sum of the oblique projections of Yr onto 
span(U) along span(W,) and of Y> onto span(W,,) along span(U,). 

On the other hand, from (6.47), 


Ye = DU rs + OpXs (6.64) 


Again, from A3) of Assumption 6.1, span(U;)M span(X ;) = {0}, so that the right- 
hand side of (6.64) is the direct sum of the oblique projections of Y; onto span(Uy) 


®Note that if the output y is disturbed by a noise, then we have L44 ~ 0, implying that 
Yr ZY. 
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along span(X s) and of Y; onto span(X ;) along span(U,y). Thus, comparing (6.63) 
and (6.64), we have the desired result, and hence 


€ = Eyu, {¥7 | Wp} = Rao RhW, = OnXS (6.65) 


This proves (6.56). Equations (6.55), (6.57) and (6.58) are obvious from (6.54). 


It follows from Theorem 6.3 that the estimates of A and C can be obtained from 
the extended observability matrix of (6.57). Also, we see from (6.63) and (6.64) that 


W, = (R31 — Rao Ri, Roi) Ri (6.66) 
holds. Hence, by the definition of %, 
D CG atari) 
CB Dy sts ? : 
| = (Rai — Ra Rj, Ra) Ry (6.67) 
CAt-2B CAR-3B... D 


which is similar to the expression (6.43). Thus we can apply the same method used 
in the MOESP algorithm of Lemma 6.6 to compute the estimates of B and D. 

Van Overschee and De Moor [165] have developed a subspace method of iden- 
tifying state space models by using the state vector given by (6.58). In fact, from 
(6.58), we have the estimate of the state vector 


Xp =[a(k) a(kK+1) +--+ c(k+N—2) c(k+N-—1)] (6.68) 


We define the following matrices with N — 1 columns as 


Xpgi t= [2(k +1) --) a(k+N—-1)] (6.69a) 
Xx = (x(k) -- 2(k +N —2)] (6.69b) 
Uyjn = [u(k) «+> uk +N — 2)] (6.69c) 
Vale = [y(k) --- y(k+ N —2)| (6.69d) 
Then, it follows that Z 7 
wel-[éallé] 


This is a system of linear equations for the system matrices, so that they can be 
estimated by applying the least-squares method: 


seen (caleal 


Summarizing the above, we have the following lemma. 


Xen 
Yee 
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Lemma 6.8. (N4SID algorithm) 

Step 1: Compute € by using (6.65) and the LQ decomposition of (6.59). 
7 Step 2: Compute the state vector X from (6.58), and define Xai Xp, Velie 
Upjx as in (6.69). 

Step 3: Compute the matrices A, B, C, D by solving the regression equation 
(6.70) by using the least-squares technique. 


Remark 6.1. This is a slightly modified version of Algorithm 1 in Chapter 3 of [165]. 
The LQ decomposition of (6.52) has been employed by Verhaegen [171] to develop 
the PO-MOESP algorithm. Here we have used it to compute the oblique projections. 
The LQ decomposition is frequently used in Chapters 9 and 10 in order to compute 
orthogonal and oblique projections. 


6.7 SVD and Additive Noises 


Up to now, we have presented deterministic realization results under the assumption 
that complete noise-free input-output data are available; but it is well known that real 
data are corrupted by noises. Thus, in this section, we consider the SVD of a data 
matrix corrupted by a uniform white noise. We show that the left singular vectors of 
a wide rectangular matrix are not very sensitive to additive white noise. 

Consider a real rectangular matrix X € R“*% with M < N. Let rank(X) = 
r < M, and let the SVD of X/VN be given by 


(eee, rie SO EE |i... ¢ 
ak = UEV = [Us Un] | - | a =U,2,V. (6.71) 
where 3), = diag(o1, --- , o,), and where 
01 209 26+ DO, > Org = +: = ou = 9 


The matrices U € RM@*™, V € RY" are orthogonal, and U, := U(1: M,1:r), 
V, :=V(1: N,1:r). From (6.71), 


1 T T = 0 MxM 
Vex =UEE™ =U ER (6.72) 


and hence, 
1 
Wek ui = oF ui, a=1,---yr 
We see that the left singular vectors u; of X” / JN are the eigenvectors of XX Hh / N, 


so that we have 
OG, SAX (NN); i=1,---,r 


We consider the effect of white noise on the SVD of X. Let X be perturbed by 
white noise =. Then, the observed data Y is expressed as 
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YHX+ (6.73) 


where the element €;; of = is white noise with mean 0 and variance 02. Thus, we 
have 


: 1 "BY. oye 1 a 2 
a ase Se ea 


Hence, from (6.72), YY '/N is approximated as 


1 sl 1 T 2 
2 
=u (| 0 |) OT +08 Ew (6.74) 


where N is sufficiently large and where U(ozIy)U' = UT (o2Iu)U = of Ins is 
used. Defining S$? = © + 071,, (6.74) becomes 


ee So... Ue li seegeat 
HYY™ =[U. Un] | A ip Ea =US?U (6.75) 
where S = diag(si1, --- , ar), $; > 0 with 
Tg fe 
Ps el a a (6.76) 
ore ta=rt+1,-:-,M 


as shown in Figure 6.3. It should be noted that the right-hand side of (6.75) is the 
eigenvalue decomposition of the sample covariance matrix of Y. 


Si 


Index 7 


Figure 6.3. Singular values of Y/N and X/VN 


Lemma 6.9. Suppose that the variance Oz of the white noise &;; is relatively small, 


and that N is sufficiently large. Then, for the SVDs of X/V/N and Y/N, the fol- 
lowing (i) ~ (iii) hold. 
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(i) The singular values of Y/N are given by (81, 82, +++ , 8m), where 
81 282 26 SS > Spy SH = SM HOE 


Thus, the M — r minimum singular values of Y/VN are nearly equal to o¢. 
(ii) The eigenvalues of XXT/N are obtained by o? = s? O;; Z=1,-++ yr. 


(iii) The left singular vectors of Y/V/N are close to those of X/VN. Hence, we 
know the left singular vectors U1, +++ , Up of X/VN from the left singular vec- 
tors corresponding to r singular values 8s; > 82 > ++: > 8,» of Y/VN. This 
fact implies that information about X can be extracted from the noise corrupted 
observation Y by using the SVD. 


Proof. See [40, 162]. 


Remark 6.2. We see from Lemma 6.9 that the information about X is contained in 
the r left singular vectors corresponding to the r largest singular values of Y, and no 
information is in the singular vectors corresponding to smaller singular values. Hence 
the subspace spanned by the left singular vectors U, = [ui, --- , u,] corresponding 
to the first r singular values is called the signal subspace. Also, the subspace spanned 
by U, = [ur+i, +++, ua] is called the noise subspace associated with the space 
spanned by Y. It should be noted here that noise subspaces are no less important 
than signal subspaces in applications. In fact, noise subspaces are successfully used 
for solving blind identification problems [156, 162]. 


We give a numerical example which is related to a frequency estimation problem 
based on noisy data. This clearly shows the meaning of Lemma 6.9. 


Example 6.7. Consider a simple sinusoidal model 
a(t) = ay sin(wit + v1) + ag sin(wot + yo) 


where ay = 10, a2 = 5 denote the amplitudes, w, = 0.247, we = 0.267 two 
adjacent angular frequencies, (1, 2 random variables with uniform distribution on 
the interval (—7, 7). We assume that the observation is given by 


y(t) = a(t) + e(t) 


where e is a Gaussian white noise with mean zero and variance o~. 

For random initial phases ~1, yz, and a Gaussian white noise e, we generated 
the data y(t), t = 1, --- , 1024. Assuming that k = 16, we formed X and Y, and 
computed the singular values o; and s; of X/ VN and Y/ VN, respectively; the 
results are shown in Table 6.4. 

We observe that the singular values s;, 1 > 5 of Y/ VN decrease very slowly as 
the index i. In this case, it is not difficult to determine rank(X/WN) = 4 based on 
the distribution of singular values of Y/ JN . If the noise variance of e increases, the 
decision becomes difficult. Also, if the difference |w; — w»| of the two angular fre- 
quencies get larger, the singular values s3, s4 become larger, while s;, 2 > 5 remain 
almost the same. Thus, if the difference of two angular frequencies is expanded, the 
rank determination of X/ VN becomes easier. 
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Table 6.4. Singular values of X/ VN and Y/ JN 


a 1 2 3 4 5 6 tee 16 
oj 22.6323 22.0416 2.5283 2.5128 0 0 ee 0 
8 22.6903 22.1013 2.7308 2.7276 1.1132 1.0769 --- 0.9046 


6.8 Notes and References 


e The germ of realization theory is found in the papers by Gilbert [56] and Kalman 
[82]. Then Ho and Kalman [72] has first solved the deterministic realization prob- 
lem and derived a method of constructing a discrete-time state space model based 
on a given impulse response sequence; see also Kalman, Falb and Arbib [85]. 
Zeiger and McEwen [184] have further studied this algorithm based on the SVD 
to make its numerical implementation easier. Other references cited in this chap- 
ter are review articles [162, 175] and books [59, 147, 157]. 


e Two realization problems are stated in Section 6.1; one is the classical realization 
problem to recover state space models from given impulse responses, and the 
other is the deterministic identification problem to construct state space models 
from observed input and output data. The classical solution based on the SVD of 
the Hankel matrix formed by the given impulse responses is described in Section 
6.2. A program of Ho-Kalman algorithm of Lemma 6.1 is provided in Table D.1 
of Appendix D. 


e Acrucial problem of the realization method based on the infinite Hankel matrix 
is that it is necessary to assume that the Hankel matrix has finite rank a priori. 
In fact, it is impossible to determine the rank of infinite dimensional matrix in 
finite steps, and also it is not practical to assume that infinite impulse responses 
are available. The realization problem based on finite impulse response matrices 
is related to the partial realization problem; see Theorem 3.14. 


e In Section 6.3, we have defined the data matrix generated by the input-output 
observations, and basic assumptions for the inputs and system are introduced. It 
is shown by using some examples that information about the image of extended 
observability matrix can be retrieved from the data matrix [162]. Lemma 6.3 
is due to Moonen et al. [118,119], but essentially the same result is proved in 
Gopinath [62], of which relation to subspace methods has been explored in [177]. 
The proof of Lemma 6.3 is based on the author’s review paper [86], and Lemma 
6.4 is adapted from the technical report [181]. 


e In Section 6.4, we have shown that the image of extended observability matrix 
can be extracted by using the LQ decomposition of the data matrix, followed by 
the SVD. Lemma 6.5 provides a system theoretic interpretation of the L-matrix, 
each column of which is an input-output pair of the system. 


e Two subspace identification methods, the MOESP method [172,173] and N4SID 
method [164, 165], are introduced in Sections 6.5 and 6.6, respectively. A proof 
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of the N4SID method is based on Lemma 6.5 and a new Theorem 6.2. Some 
numerical results using the MOESP method are also included. 


In Section 6.7, based on [40, 150, 162], we considered the SVD of wide rectan- 
gular matrices and the influence of white noise on the SVD, and defined signal 
and noise subspaces. It is shown that, since the SVD is robust to a white noise, 
the column space of unknown signal is recovered from the column space of noise 
corrupted observed signal. Lemma 6.9 gives a basis of MUSIC method; for more 
details, see [150, 162]. 


6.9 Problems 


6.1 Find the realizations of the following sequences. 


(a) Natural numbers: (0, 1, 2, ---) 
(b) A periodic signal: (0, 1, 0, —1, 0, 0, 1, 0, —1, 0,0 ---) 


6.2 Compute the reachability and observability Gramians for (A, B, C) obtained 


by the algorithm of Lemma 6.1 under the conditions of Lemma 6.2. 


6.3 Suppose that A € R?*", B © RIX, where p, q < N. Let the orthogonal 


6.4 


6. 


ea) 


projection of the row vectors of A onto the space spanned by the row vectors of 
B be defined by F{A | B}. Prove the following. 


E{A | B} = AB™(BB")'B 
If B has full row rank, the pseudo-inverse is replaced by the inverse. 
Let A and B be defined in Problem 6.3. Consider the LQ decomposition of 


(6.32): 
B\_ |Li 0 Qt 
[a}= [2c [at] 
Suppose that B has full row rank. Show that the orthogonal projection is given 
by 
E{A| B} = La QT = In Ly) B = A(Q1Q7) 
Let B+ be the orthogonal complement of the space spanned by the row vec- 


tors of B. Then, the orthogonal projection of the row vectors of A onto B+ is 
expressed as 7 

E{A| B*} = Lx2Qz = A(Q2Q2) 
Suppose that A € R°*", Be RIX*N,C € R"™™%, where p, g, r < N. Suppose 
that B and C’ have row full rank, and span{B}M span{C} = {0}. (Note that 


this condition corresponds to A3) in Assumption 6.1.) Then, E\c{A | B}, the 
oblique projection of the row vectors of A onto B along C, is expressed as 


BBT BC™]'[B 
CBT cct 0 


Byc{A | B} = A[BT CT] | 


6.6 Prove (6.60). (Hint: Use Lemmas 6.4 and 6.5.) 


7 


Stochastic Realization Theory (1) 


Stochastic realization theory provides a method of constructing Markov models that 
simulate a stationary stochastic process with a prescribed covariance matrix, and 
serves as a basis for the subspace identification methods. In this chapter, we present 
a method of stochastic realization by using the deterministic realization theory and 
a linear matrix inequality (LMI) satisfied by the state covariance matrix. We show 
that all solutions to the stochastic realization problem are derived from solutions 
of the LMI. Using the approach due to Faurre [45-47], we show that the positive 
realness of covariance matrices and the existence of Markov models are equivalent, 
and then derive matrix Riccati equations that compute the boundary solutions of the 
LMI. Moreover, we discuss results for strictly positive real conditions and present a 
stochastic realization algorithm based on a finite covariance data. 


7.1 Preliminaries 


Consider a second-order vector stationary process y € IR? with zero mean and co- 
variance matrices 


A(l) = Efy(t + Dy" (b)}, i=0, +1,--- (7.1) 


where the covariance matrices satisfy the condition 


Ye AOI < (7.2) 


l=—oo 


It therefore follows that the spectral density matrix of y is given by 
B(z)= S> A(z" (px p matrix) (7.3) 


In the following, we assume that y is regular and of full rank, in the sense that the 
spectral density matrix #(z) has full rank [68, 138]. 
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The stochastic realization problem is to find all Markov models whose outputs 
simulate given covariance data of (7.1), or spectral density matrix of (7.3). In this 
chapter, we assume that an infinite data {y(¢), t = 0, +1,---}, or a complete se- 
quence of covariances {A(1)}, is available, so that here we develop a theoretical 
foundation for the existence of stochastic realizations, thereby leaving a more real- 
istic problem of identifying state space models based on finite input-output data for 
later chapters. 

Let t be the present time. Let the stacked infinite dimensional vectors of the future 
and past be given by! 


Then, the covariance matrix of the future and past is defined by 


A(1) A(2) A(3) --- 
A(2) A(3) A(4) «+: 
H = E{f(t)p" (t)} = | A(3) A(4) A(5) «+ (7.4) 


and the auto-covariance matrices of the future and the past are respectively given by 


A(O er ae 


1 
Ty = E{f()f*()} = | AQ) mn 200) (7.5) 
and 
A(0) AQ) A(2) -- 
AT(1) A(0) A(1) -- 
T_ = Efp(t)p"()} = | AT (2) A(0) « (7.6) 


where # is an infinite dimensional block Hankel matrix, and T, are infinite dimen- 
sional block Toeplitz matrices. 

As in the deterministic case, it is assumed that rank(H) = n < oo. Then, from 
the deterministic realization theory described in Section 6.2, there exists a minimal 
realization (A, C, C, A(0)) satisfying 


A(0), 1=0 
A(l) = . (7.7) 
CAO. HPS 1, 2ye 


'Though the present time t is included in the future, we could include it in the past as well. 
Then, by definition, we obtain a model without a noise term in the output equation [2]. 
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where A € R”*" and C, C € R°*” are constant matrices. It should be noted from 
(7.2) and (7.7) that A is stable. Thus, the deterministic state space realization 


o@=|z nol 


is observable and reachable, and that the impulse response matrices are given by 
{A(t), t = 0, 1,---}. In the following, we say that (A, C, C‘) is minimal, if 
(C, A) is observable and (A, C) is reachable. 

Define the infinite dimensional observability and reachability matrices as 


C 


CA 
O= 


; C= [(e* AG?” CG) as | 
Then, we see from (7.7) that the block Hankel matrix of (7.4) has a factorization 
H=0€ (7.8) 


This is exactly the same factorization we have seen for the deterministic factorization 
in Section 6.2; see Theorem 6.1. 
It can be shown from (7.7) that 


AG) = CA Cl 11) + Ae C AT eet 1) (7.9) 


1 1=0,1,.--- 
1(1 = | ° 9 
0) fe i= -1, -2, ++ 


where 


Hence, from (7.3), the spectral density matrix is expressed as 


ioe) —1 
®(z) = SCAM CT 27 + AO) + p22 renner 
l=1 


—co 


=S° cA tCTz1 + AO) + So C(AT) 1072! 
l=1 


=CGL= Ay Crs 540) CGT APP ers 540) (7.10) 


If we define 1 
Z(z) = C(zI— A)“1C7 + 540) (7.11) 


then the spectral density matrix satisfies 
@(z) = Z(z) + Z*(z71) (7.12) 


This is a well-known additive decomposition of the spectral density matrix. 
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Let p := p(A), the spectral radius. Since A is stable, we get 0 < p < 1. Hence 
the right-hand side of (7.10) is absolutely convergent for p < |z| < p~!, implying 
that the spectral density matrix is analytic in the annular domain p < |z| < p™' 
that includes the unit circle (|z| = 1) (see Example 4.12). Let (w) := @(z)|,—- ie. 
Then, we have 

B(w) = Z(e”) + Z*(e-4) 
= Z(el”) + Z8(e") > 0, -t<w<a (7.13) 
For scalar systems, this is equivalently written as ReZ(e%”) > 0, where Re denotes 
the real part. 


Definition 7.1. (Positive real matrix) A square matrix Z(z) is positive real if the 
conditions (i) and (ii) are satisfied: 
(i) Z(z) is analytic in |z| > 1. 
(ii) Z(z) satisfies (7.13). 
If, together with item (i), a stronger condition 
(ii’) Z(e”) + ZE(e”) > 0, -—t@ew<a 


holds, then Z(z) is called strictly positive real. In this case, it follows that ®(w) > 0 
for —m <w <7, and such ®(z) is called coercive. 


7.2 Stochastic Realization Problem 


In this section we introduce a forward Markov model for a stationary stochastic pro- 
cess, and define the stochastic realization problem due to Faurre [45]. 
Consider a state space model of the form 


a(t +1) = Aox(t) + w(t) (7.14) 
y(t) = Cox(t) + v(t) (7.14b) 


where y € R? is the output vector, x € IR” the state vector, and where w € IR” and 
v € R? are white noises with mean zero and covariance matrices 


e{[a8] ern eo}=[S3]& as 


It is assumed that Ag is stable, (Co, Ao) is observable, and (Ag, Q!/?) is reachable. 
In this case, the model of (7.14) is called a stationary Markov model as discussed in 
Section 4.7 (see Figure 7.1). 

Let IT = E{x(t)x"(t)}. Then, we have 


I= Y AiQtagy > 
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Elio 


Figure 7.1. Markov model 


and hence JJ satisfies the Lyapunov equation (see Section 4.7) 
TT = Ap TAS +Q (7.16) 
From Lemma 4.10, the covariance matrix of x satisfies 
Ae) = ae pene (7.17) 
TT(At)—, l= -—-1, -2,.-: 
Moreover, from Lemma 4.11, the covariance matrix of y is given by 
CoAp (AolICe +S), 1=1,2,--- 
A(l) = § CollCd + R, 1=0 (7.18) 
ARH {= -1, -2,-:- 


Thus, comparing (7.7) and (7.18), we conclude that 


Ao =A (7.19a) 
Coe (7.19b) 
Ap licg +S =C* (7.19c) 
ColIC} +R = A(0) (7.19d) 


It may be noted that A, C, C™ in the right-hand side of (7.19) can respectively be 
replaced by T-! AT, CT, T~!C™ for an arbitrary nonsingular matrix 7; but for 
simplicity, it is assumed that T = [,,. 

Since A, C, C in (7.19) are given by the factorization (7.7), they are regarded as 
given data. Recall from Example 4.11 that these matrices A, C, C' are expressed as 


A= E{x(t¢+1)z*()} 07 
C = Fly(t)x"(t)} 0" (7.20) 
C = Efy(ta" (t+ 1} 
It follows from (7.16) and (7.19) that 
IT — ATTA’ =Q (7.2 1a) 
cT-—aAnct=S (7.21b) 
A(0)-CHICT=R (7.21c) 
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where 


[stn] 22 II>0 (7.22) 


For given data (A, C, C, A(0)), the stochastic realization problem considered 
by Faurre [46, 47] is to find four covariance matrices (IT, Q, R, S) satisfying (7.21) 
and (7.22). This problem can be solved by using the techniques of linear matrix 
inequality (LMI) and spectral factorization as shown in Section 7.3. 

From Lemma 4.12, recall that a backward Markov model for a stationary process 
is given by the following lemma . 


Lemma 7.1. Define x; (t—1) = Ia(t) with IT = I~". Then, the backward Markov 
model is given by 


ap(t — 1) = Atay(t) + ws(t) (7.23a) 
y(t) = Cay (t) + w(t) (7.23b) 


where A and C, called the backward output matrix, are given by (7.20), and where 
wp and vp are zero mean white noises with covariance matrices 


e{[MG]eto wor} [S§]— a2» 
Moreover, we have cov{x»(t)} = IT and 


CHa =ATA, Sao Ae KR=AG = Cne* (7.25) 


Proof. See the proof of Lemma 4.12. 


We see that the forward Markov model is characterized by (IT, A, C,C, A(0)), 
whereas the backward model is characterized by (I7, A’, C, C, A(0)). Thus there 
exists a one-to-one correspondence between the forward and backward models of the 
form 


Tf i, Axa. OSL OQ, SE:5 Ber 


This fact is employed to prove the boundedness of the solution set of the ARE satis- 
fied by IT of Theorem 7.4 (see Section 7.4). 


7.3 Solution of Stochastic Realization Problem 


Theory of stochastic realization provides a technique of computing all the Markov 
models that generate a stationary stochastic process with a prescribed covariance 
matrix. Besides, it is very important from practical points of view since it serves a 
theoretical foundation for subspace methods for identifying state space models from 
given input-output data. 
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7.3.1 Linear Matrix Inequality 


Suppose that the data (A, C, C, A(0)) are given. Substituting (Q, R, S) of (7.21) 
into (7.22), we see that the stochastic realization problem is reduced to finding solu- 
tions IT > 0 satisfying the LMI such that 


W—AMA™ CT —Anct 


7 >0 7.26 
C-—cCmA™ A(0)—cCmc™ ve) 


M(II) := 


Note that if there exists JT > 0 that satisfies M (JT) > 0, then from (7.21), we have 
(Q, R, S) satisfying (7.22). Thus Z(z) of (7.11) becomes positive real. 


Theorem 7.1. Suppose that (A, C, C™) is minimal, and A is stable. Let IT be a 
solution of the LMI (7.26), and let a factorization of M (IT) be given by 


M(II) = B [F] 20 (7.27) 


where Bi has full column rank. In terms of B and D, we further define 


W(z)=D+C(zI— A)'B (7.28) 


Then, W(z) is a minimal spectral factor of the spectral density matrix ®(z) that 
satisfies 


&(z) = W(z)W1 (271) (7.29) 


Conversely, suppose that there exists a stable minimal spectral factor satisfying 
(7.29). Then, there exists a solution IT > 0 satisfying (7.26). 


Proof. [107] Let IJ > 0 be a solution of (7.26). It is clear that B, D satisfying 
(7.27) are unique up to orthogonal transforms. From the (1,1)-block of the equality 
of (7.27), we have the Lyapunov equation 


TW — ATA™ = BB (7.30) 


where IT is positive definite and A is stable. Thus it follows from Lemma 3.5 that 
(A, B) is reachable, implying that W (z) of (7.28) is minimal and stable. 
Now, from (7.28), we have 


W(z)W"(z7!) =[D+ C(zI — A)! BID? + BT (2711 — A*)“'C77] 
= DD" +C(2l —Ay (BB  (@ F=Aly ct 
+ O(2I — A) BD! + DBT (271r—A™)“1CT (7.31) 
From (7.30), we have the identity 


BB! = (zI — A)H(z-11 — A?) + (2f — AHA! + AM(z“1I— A") 
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Substituting this identity into (7.31) and rearranging the terms using Q = BB", 
S = BD", R= DD" and (7.21) yield 


W(z)W1 (27!) = DD? + CHC" + C(zI — A)“'(ANC" + BD") 
+ (CIA™ + DB*)(z-1I — At)-*c? 
= AO) EO~T = ATCT ECG TSA) (Ce? 
=F(z)+ 2-1) =o) (7.32) 


Thus we conclude that W(z) is a stable minimal spectral factor of 6(z). 

Conversely, suppose that W(z) = D + C)(zI — A,)~'B is a minimal spectral 
factor. Since (A, C, C, A(0)) are given data, we can set Ay = A and C, = C asin 
(7.19). Now using W(z) = D + C(zI — A)~'B, we get 


(7.33) 


W(z)W' (271) = [C(I - A)? ppt Apt | ee A )-C? 


DB" Dp* I 


Since (A, B) is reachable, there exists a unique solution II > 0 for the Lyapunov 
equation BB? = IT — AIT A’, from which 


BB™ = (2I — A)I(z-1I — A™) + (21 — A)ITA™ + All(2z1I — A?) 
Substituting this identity into (7.33) yields 
W(z)WT (27!) = DDT + CHICT + C(zI — A)71(AHICT + BD") 
MOCHA = pA Taf" ye" 
But from (7.11) and (7.12), 
W(z)W1 (271) = (2) = Z(z) + Z0 (271) 
=CGT =A)" CO 4d 4 Ce TA! )Ct 


holds. Since (C, A) is observable, all the columns of C(zI — A)~! are independent. 
Thus, comparing two expressions for W(z)W1(z7') above gives 


DD™ = A(0)—Cltc’,  BD™ =C™ — aAlrc™ 
By using these relations in (7.33), we get 
if — AITAT CT — Altc™ 
C=CHAT MOa Cre? 


. the — all 


W(2)WT (2-1) =[C(@eI— A 


=[C(2t-— A)" MU) ie 3 | 


I 
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Clearly, the right-hand side of the above equation is nonnegative definite for z = e9, 
—m < @ < 7, so that M(II) > 0. Thus the triplet (17, B, D) satisfies (7.27), 
implying that the LMI (7.26) has a solution IT > 0. 


Now we examine the size of spectral factor W(z) under the assumption that 
2 has full column rank and that R(IT) := A(0) — CITCT > 0?. Recall that the 
factorization formula for the block matrix of the form (see Problem 2.2) 


avle [or [Po vl lene 


ZV|_|o TF 0 V eA | 


Comparing the above expression with M(I1) of (7.26), we get X = IT — ATTA’, 
Y =Ct— AMC? = Z? andV = R(/1). Thus it can be shown that 


_ [i K][M@-AMA™-—KR(IM)K™ 0 I 0 
McD) = |4 i | 0 Rim)| [KT 7 
where Kc is exactly the Kalman gain given by [see (5.74)] 

R=(C' = Are RA) 


We define 
Ric(I7) := AITA™ — 17 + KR(II)K* (7.34) 


Then, under the assumption that R(J7) > 0, we see that 
M(II)>0  Ric(IT) <0 


This implies that M(/T) > 0 if and only if RUT) > 0 and the algebraic Riccati 
inequality (ARID 


AIA’ — 1 + (CT — ATIC™)(A(0) — CHC™)“1(6 —CHA™) <0 (7.35) 
holds. Moreover, we have 
m := rank M(IT) = p + rank Ric(/7) (7.36) 
Hence, if [7 is a solution of the ARE 
I = AITA™ + (GC? — AC™)(A(0) —CHC™)"\(G-— CHA") (7.37) 


then the corresponding spectral factor W (z) is a p x p square matrix. Otherwise, we 
have m > p, so that the spectral factor W(z) becomes a wide rectangular matrix. 

It follows from Theorem 7.1 and the above argument that a Markov model of y 
is given by 


a(t+1) = Ax(t) + Bv(t) (7.38a) 
y(t) = Ca(t) + Dv(t) (7.38b) 


where B and D are solutions of the LMI, and v is a white noise with mean zero and 
covariance matrix [,,, with m > p. 


?Note that the latter condition cannot be avoided in the following development. 
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Lemma 7.2. Suppose that (A, C') is reachable. Then all solutions of the ARE 
(7.37) are positive definite. (Note that IT = E{a(t)x* (t)} > 0.) 


Proof. Suppose that JT is not positive definite. Then, there exists 7 € R” with 
n' I = 0. Thus, pre-multiplying (7.37) by 77 and post-multiplying by 7 yield 


n' AIA'n +9'(C* — AHC™)(A(O) — CHCT)“1(C — CHA"™)n = 0 
Since both terms in the left-hand side are nonnegative, we have 
n | All = 0, nicl =0 
Define 7} := 11 A. Then, from the above, 1! 7 = 0 holds, and hence we get 
GQAl=0; aC sO > a7 4S, a ACT =0 
Repeating this procedure, we have eventually 


nt[OT ACT A26T ... J=0 


This is a contradiction that (A, C) is reachable, implying that J7 > 0. 


The ARE of (7.37) is the same as that of (5.75) satisfied by the state covariance 
equation of the stationary Kalman filter. Hence, we see that the square spectral factor 
is closely related to the stationary Kalman filter as shown in Example 7.1 below. 


7.3.2. Simple Examples 


Example 7.1. Suppose that A = 1/3, C = 2, C = 2/3, A(0) = 9/4 are given. 
From (7.11) and (7.12), the spectral density is given by 


4/3 9 4/3 9 
Pz) =Z hag maa 7 = 
eS AE AE ag ga 
It is easy to see that 
9 4/3 
A — 
OSs ge A 
is strictly positive real. Also, the ARI (7.35) becomes 
1 2 2eN279 ol 
Ric(I1) = = — (5-57) (5-47) & 
ic(IT) 9 + 33 q <0 
where 9/4 — 4/7 > 0 by the assumption. It therefore follows that 
26 4 1 2 
4? — 17 = =4(- =) (m-=) < 
9 = 9 2 9/ — 


Hence, we have JJ, = 2 / 9 and J7* = 1 / 2, and any solution JT of the Riccati 
inequality satisfies 17, < I < II*. It should be noted that these are boundary 
solutions of the LMI as well. 
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i) Consider the case where IT = IT, = 2/9. From the LMI of (7.26), 


16/81 14/27] [4/9 
a hee co - be [4/9 7/6| 


Thus we have B = 4/9, D = 7/6, so that, from (7.28), the spectral factor is given 


by 
7 8/9 7\ 2+3/7 
W., —ie = -_ 
OS Gaya (Z) Z= 1/3 
The corresponding Markov model is given by the innovation model 
1 4 
at+1)= g(t) + tt) (7.39a) 
7 
y(t) = 2a(t) + tt) (7.39b) 


where v is a white noise with mean 0 and variance 1. 
ii) Consider the case where IJ = I7* = 1/2. In this case, we have 


4/9 a = ie 


a Be 1/4 1/2 


| [2/3 1/2] 


Thus we get B = 2/3, D = 1/2, so that the spectral factor is given by 


1 4/3 1\ 24+7/3 
w* = = - 
le © eae (5) 2-1/3 
so that the Markov model becomes 
1 2 
a(t+1)= g(t) + FLA (7.40a) 
1 
y(t) = 2a(t) + u(t) (7.40b) 


We observe that W,(z) and its inverse W,-'(z) are stable, implying that this is 
a minimal phase function. But, the inverse of W*(z) is unstable, so that this is a 
non-minimal phase function. 


The next example is concerned with a spectral factor for IT satisfying the Riccati 
inequality, i.e.,2/9 < IT < 1/2. 


Example 7.2. For simplicity, let 7 = 1/4. Then, we have 
2/9 1/2 by bz | | bi d 

M(1/4) = = = 
(1/4) fie a E Al B | 


Though there are other solutions to the above equation, we pick a particular solution 
d= V5/2, bh = 1/V5, by = 1/45. Thus the spectral factor is given by 


db, @& 


be a 
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[v5 v5 vB 
WOH Po 3 pays 


In this case, since m = rank M(1/4) = 2, the spectral factor becomes rectangular, 
and the corresponding Markov model is given by 


1 
a(t +1) = —2(t) + 
(+1) = 50) + 
5) 
y(t) = 2x(t) + ue (t) (7.41b) 
where v; and v2 are mutually independent white noises with mean zero and unit 
variance. That JT = 1/4 implies that the variance of the state x of (7.41) is 1/4. 
Now we construct the stationary Kalman filter for the system (7.41). Since A = 
1/3, C = 2,Q = 2/9, S = 1/2, R = 5/4, the ARE of (5.67) becomes 


pele) (pe) ae 
~ 9 a 4 9 


Rearranging the above equation yields 


v(t) (7.41a) 


1 1 
SOP eh SP tl) (sr - =) =) 


Thus we have P = 1/36 since P > 0, so that from (5.68), the Kalman gain is given 
by K = 8/21. This implies that the stationary Kalman filter has the form 


: 1. 8 
&t+1|t)= gett |}¢-—1)+ 3°) (7.42a) 
y(t) = 2a(¢ | t -— 1) + e(t) (7.42b) 


where e is the innovation process with zero mean and covariance (see Lemma 5.7) 
cov{e} = C?P + R= (7/6)" 


It can easily be shown that the innovation process e is related to v of (7.39) via 
e(t) = (7/6)v(t), so that (7.39) and (7.42) are equivalent under this relation. Also, 
the transfer function of the stationary Kalman filter from e to y becomes 

16/21 z+3/7 6 


Ty.(2) =1 = ae 
i) Sa Bata 


W,.(z) 


We see that T;,.(z) equals the minimal phase spectral factor obtained in Example 7.1 
up to a constant factor 6/7. 

Also, it can be shown that the stationary Kalman filter for a state space model 
(7.40) is the same as the one derived above. This fact implies that the spectral factor 
corresponding to J7, is of minimal phase, and its state space model is given by a 
stationary Kalman filter. 
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Example 7.3. Consider the case where A = 1/3, C = C, C = 2/3 and A(0) = 2. 
Note that (A, C’, C’) are the same as in Example 7.1, while A(0) is reduced by 1/4. 
Thus we have 

4/3 


Z(z)=1 <ReZ(el”) < 
(z) + 7=18 > 0< ReZ(el”’) <3 


Thus Z(z) is positive real, but not strictly positive real. Under the assumption that 
R(T) =2—4I > 0, it follows from (7.35) that 


Ric(IZ) = 5-H + ( 2 =) (2-411) <0 


Rearranging the above inequality yields 
97° -67+1<0 => (381-1)? <0 


Obviously, this inequality has only one degenerate solution 7 = 7, = H* = 1/3, 


so that 
8/27 a _ 2v2 7) 


4/9 2/3 Cage asa 


M(1/3) = | 
Thus the spectral factor is given by 
v2 | a ae 2 zt+1 
V3 373z-1/8 V3z-1/3 


We see that if the data (A, C, C, A(0)) do not satisfy the strictly positive real 
condition, the Riccati inequality degenerates, and the spectral factor W (z) has zeros 
on the unit circle. 


W(z) = 


From above examples, we see that under the assumption that R(I7) > 0, there 
exist the maximum and minimum solutions (//,, /7*) of the LMI (7.26), and that 
all other solutions of the LMI are bounded (/7, < IT < II*). If we can show that 
this observation holds for general matrix cases, then we can completely solve the 
stochastic realization problem. The rest of this chapter is devoted to the studies in 
this direction. 


7.4 Positivity and Existence of Markov Models 


7.4.1 Positive Real Lemma 
Let P be the set of solutions of the LMI (7.26), i.e., 
P={1|M()>0, W'=T, 1 >0} 


where it may be noted that the condition that JT is positive definite is not imposed 
here. However, eventually, we can prove that all IJ € * are positive definite under 
the minimality assumption in Subsection 7.4.2. 
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Given a positive definite JT € , we can find B and D by the factorization of 
(7.27) and hence we get Q = BB’, S = BD", R = DD’. Thus, associated with 
IT € ®, there exists a Markov model for y given by (7.14) [or (7.38)]. In this section, 
we prove some results that characterize the set P. 


Lemma 7.3. The set P defined above is closed and convex. 


Proof. To prove the closedness, we consider a sequence JT, Iz, --- € P such that 

jim IT, = II, i.e., jim || 17, — IT|| = 0. Since M(-) is continuous from Problem 
—0o 090 

7.4, we get 


0< lim M(x) = M(lim I.) = M(ID) 
k—-0o k-0o 


Clearly, JT is symmetric and nonnegative definite, so that JT € P holds. 
Now suppose that /7,, [Tz € ?. Then, fora+ 8 = 1 witha, 8 > 0, it can easily 
be shown that 


M(all, + BIlz) =aM(UIN) + BM (Hz) > 0 


Thus, all, + GIy € P. This completes the proof. 


Definition 7.2. For u(i) € R’, i = —1, —2, ---, we define the infinite dimensional 
vector 
u(—1) 


Also, associated with the Toeplitz matrix T. of (7.5), we define a quadratic form 


-1 


u'Tiu:= So ul(k) AU —k)u(l) (7.43) 


where it may be noted that k, | take negative integers. In this case, ifu'T,u > 0 
holds for any u, then T, is referred to as positive real. Also, if u = 0 follows from 
u'T,u =0, then T, is called strictly positive real. Moreover, if the condition 


ulT iu = p|lull?, dp>o (7.44) 


holds, T,. is called coercive. 


It can be shown that the Toeplitz matrix T', is positive real if and only if all finite 
block Toeplitz matrices are positive definite, i.e., 


A(O) = AT(A).«» AT(N 1) 


1 
A(1) — A(0)_—s« -- AT(N —2) 
T,(N) = os . >0, VN (7.45) 
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holds. Also, define the block anti-diagonal matrix 


Ip 
~ 0 Tp 
= = e RNPe Ne 
I 
, 9 


Then, for the finite block Toeplitz matrix 


A(0) A(1) ++» A(N —1) 
AT(1) A(0) «+» A(N —2) 
T_(N) = 7 (7.46) 
AT(N —1) AT(N —2) ++» A(O) 


we have T_(N) = JT,(N).J. Thus, we see that T, is positive real if and only if 
T_ is positive real. 

The following theorem gives a necessary and sufficient condition such that the 
set P is non-empty. 


Theorem 7.2. (Positive real lemma) The set P is non-empty if and only if the 
Toeplitz operator T, of (7.5) is positive real. 


For a proof of this theorem, we need the following lemma, which gives a useful 
identity satisfied by the matrices 17, Q, R, S. 


Lemma 7.4. Suppose that IT, Q, R, S are solutions of (7.21). Then, we have 
-1 
T 2 gt T T Q S| | &(t) 
wTu = TOME) + Ye WO [snl [3] ean 
where € is given by 


&(t +1) = ATE(t) + CT u(t), &(—o0) = 0 (7.48) 


Proof. A proof is deferred in Appendix of Section 7.10. 


Proof of Theorem 7.2 If 7, Q, R, S satisfy (7.21) and (7.22), then the right- 
hand side of (7.47) is nonnegative. Thus we see that T', is positive real. 

Conversely, suppose that T, is positive real, and we show that P # ¢. To this 
end, define 


8(¢) = {" 


f= y arrmetaot , €€R 


t=—oo 
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and a nonnegative matrix J7* by? 


-1 


TIT*€ = mi T(R)A(L—k)u() = min ul T 7.49 
3 Seer a )u(l) fe 1 U (7.49) 


In terms of [7*, we define [see (7.21)] 
Cea PeaAr A... S20 2ArC] FEA Scr! 
It is easy to see that if we can prove 
* iS* 
then we have P # ¢, and hence the proof is completed. 
To prove (7.50), we consider the system of (7.48): 
E(t +1) = ATED) +C*ult), — €(-00) = 0 


Let u = (---, u(—2), u(—1) ) be a control vector that brings the state vector to 
€(0) = € att = 0. It then follows from Lemma 7.4 that 


wT = Ire TW WO) es a Bal (751) 


t=—oo 
Also, let v = (+++ , u(—2), v(—1) ) be defined by 
v(t) := u(t +1), t = —2, -3,--- 


where v(—1) is not specified. Let the corresponding states be given by €,,. From the 
definition of the control vector wv, it follows that 


f(t) =€@+1), t=—2,-3,---; &(-1I =€0) =€ 


with the boundary conditions €,(—oo) = €(—oo) = 0. Define €,(0) = ¢. Then, we 
see from Lemma 7.4 that 


wre =Cmce Te) POT yn eel | cay | 


t=—oo 


=cTaree Yo e+ D ate+ vl] ye Be | (ae 


t=—oo 


=(TI+ 3 [en(t) uw" (0)] ee a BA 


+ [€7(0) oe Ee | Bal (7.52) 


>This is an optimal control problem that minimizes a generalized energy with the terminal 
condition €(0) = €. We show in Subsection 7.4.2 that the right-hand side of (7.49) is quadratic 
in €, and IT” is positive definite. 
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Since €(0) = € and u(0) = v(—1), we see from (7.51) and (7.52) that 
vi Tv —ulTyu = Cao — ee 


+ [éT vT(-1)] ey | Fan 


Hence, we get 


Q* S* 3 
oT Br ee | [oe 
= (v'T,v —CT* 6) — (u' Tiu - €' I" €) (7.53) 


From the definition of I7*, we see that v' Tv — CTI*¢ > O and that ul’ Ty u — 
€' JT*€ can be made arbitrarily small by a proper choice of u, and hence the right- 
hand side of (7.53) becomes nonnegative. Since € and v(—1) are arbitrary, we have 
proved (7.50). 


Theorem 7.3. The following statements are equivalent. 
(i) The Toeplitz matrix T', of (7.5) is positive real. 
(ii) The transfer matrix Z(z) of (7.11) is positive real. 


Proof. From (7.11), a state space model corresponding to Z'(z) is given by 


ge(t+1) = ATa(t)+CTu(t),  #(—co) =0 (7.54a) 
Wi = Cnty 5 A(O)u(t) (7.54b) 
From (7.54), we see that 
WO = So OAH CTU) + Auld 
ye Se Or 5 A(O)u(t) 
k=—oo 
and hence 
y(ult) = Yo w™A— ule) + Su" AOC) 
k=—0oco 


Taking the sum of both sides of the above equation yields [see (7.87), Section 7.10] 


—1 —1 t-1 —1 


~~ v Wult) 


t=—oo t=—oco k=—oco t=—oo 


II 
id 
4 
~~ 
cy 
NS 
= 
os" 
ch 
| 
= 
— 
S 
—~ 
x 
+ 
I 
S 
A 
os" 
x 
~ 
“= 
fon) 
wm 
md 
—~ 
x 
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Since y = Z"(z)u, it follows from Lemma 3.4 (ii) that 
-1 


dv @ult) = = i "ul (el) 2(e* ule) de 


1 Tv 


=a u! (e) [Z(e%) + ZE(e!)] ule dw 


The right-hand side of this equation is nonnegative for any u(e4”) if and only if 
Z(e3*) + ZB(e") > 0, —t<w<a 


Hence, we have 


u'Tiu=2 Be y'(t)u(t)>0 << Z(z): positive real (7.55) 


t=—oo 


This completes the proof of this theorem. 


Theorem 7.4. Suppose that the Toeplitz matrix Ts. of (7.5) is positive real. Then, P 
is bounded, closed and convex, and there exist the maximum II* and the minimum 
IT, such that for any IT € P, 

I, < I < IT* (7.56) 


holds, where the inequality A > B means that A — B is nonnegative definite. 


Proof. That P is a closed convex set is already shown in Lemma 7.3. Thus it suffices 
to show that (7.56) holds. 

First we show IT < JI*,V IT € P. From the definition of [7* of (7.49) and 
Lemma 7.4 that 


eI" = min Gi ne+ 3 te iO) | SR bales 


t=—oo 


where €(0) = €. It therefore follows that for any IT € P, 


= Q : &(t) 
Since € is arbitrary, IT < I7* holds. 
To show IT > IT,, we define the backward process y by 
g(t) =y(-t), t=0,+41,--- (7.57) 


Clearly, the process y is stationary, since 


A) = E{gt + Yy"()} = Ef{y(-t — Dy" (-1)} = A(-D) 
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Let P be the set of covariance matrices JT of the backward Markov models associated 
with the covariance matrices { A(k)}. As shown in Section 7.2, there exists a one-to- 
one correspondence between P and P in the sense that 


TeP os f=T'ep 


Now let [7* be the maximum element in the set . In fact, we can show that [7* 
exists by using the same technique used in the proof of the first part. Then, by the 
above one-to-one correspondence, IT, := (I 9 becomes the minimum element 
in P. Hence, for any IT € ?, we see that I7, < IT holds. Therefore, if P is not empty, 
it follows that (7.56) holds. 


7.4.2 Computation of Extremal Points 


We have shown that /7* and IT, are respectively the maximum and the minimum in 
the set P, and that for any IT € P, the inequality 7, < IT < II* is satisfied. In 
this subsection, we provide methods of computing the extreme points /7* and J7,, 
and show that these extreme points respectively coincide with the extreme solutions 
of the Riccati inequality defined in Section 7.3. First, we compute J7* as a limit of 
solutions of finite dimensional optimization problems derived from (7.49). 

We assume that (A, C, C7) is minimal, and define the vector 


u(—1) 
UR i= : E R*P 
u(—k) 


and also for € € R”, we define the set 


8, (€) = { 


=o armen} = {ur | €= Ofue} 


t=—-k 


Then we consider a finite dimensional optimization problem of the form 


€'M,€:= min upT;(k) up > 0 (7.58) 
Ur €S8x (E) 

where T',(k) is the block Toeplitz matrix of (7.45), and ©; is the extended ob- 
servability matrix with rank n. Since 8, (€) C Spy4i(€) C 8(€), it follows that 
TT, > ITpy1 > II*. Also, I1* > 0 holds by definition, so that IT, is decreasing 
and bounded below. Thus /7;, converges to /7*, a solution to the original infinite 
dimensional problem. 

The finite dimensional problem of (7.58) is a quadratic problem with a linear 
constraint, so that it can be solved via the Lagrange method. In fact, define 


i} 
L= unl (kh) un +AL(E — Of uz) 
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Then, from the optimality condition, we have 
—=+0 => T1(k) up — OnAn = 0 


and £ = Of ux. Hence, we see that \, = (OPT ARON where the inverse 
exists since i oe (k) and ©; have full rank. Thus, we see that the optimal solution is 
given by 


ty = Te (HOMO Te WO.) SS eS OFT (Oe) 
For simplicity, we define 
Op SO, T (Og 


where recall that 


C 


CA 
On = ; € ReExn 


CAR-1 


We now derive a recursive equation satisfied by QQ, and Ost. We see from (7.45) 
and (7.7) that 


Qn = Oni Ty (b+ DOs 


A(0) Gof 
0.0" T,(k) 


=[c? ATO} 


C 
Ea (7.59) 


By the inversion results for the block matrix [see Problem 2.3 (c)], 


A(o) Cot 
Ox.C™ T+(k) 
= —A,CORT;"(b) 
—Ty'(k)ORCTAR Ty*(k) + Ty "(k)OnCTARCORT;,"(h) 


where 
Ay := [A(O) — COPTY*(k)ORC™]-? = (A(0) —C,.07)“? 
It therefore follows from (7.59) that 
Quo = CTARC — ATOLT 1 (k)OnC* ARC — CTA,COL_T, '(k)O.A 
+ ATOETY (k)OnA + ATOET, (k)OnC™ ALCOLTY | (k)OKA 
=A OAL (COC? =A O.C (AO) = COC) (6 = C0; A) 
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From the above result, we have a recursive algorithm for computing [/*. 


Algorithm 1 Compute the solution of the discrete-time Riccati equation 
Ou = AP OA+(CT = At 0,6" )(A(0) = C26") “(6 =C0, A): (7.60) 
with the initial condition 2) = 0 to get Qo = jim Q,. Then, the maximum /7* in 
co 


* and the associated covariance matrices are given by 
T= Oe. Oa = Aire 
SeOl=Are,.. ReSA0)=cCre 


We see from the dual of Lemma 7.2 that the limit point 2, is positive definite. 


Remark 7.1. It follows from (7.25) that the dual LMI for (7.26) is given by 


: ff— ATA CT — ATHCT 
M(ID) : 


=o Gia ao) -cmet| 2° 1 >0 (7.61) 


Thus, Algorithm | recursively computes the minimum solution of the dual Riccati 
equation associated with (7.61), thereby giving the minimal covariance matrix of the 
backward Markov model. Thus, the inverse of the limit gives the maximum solution 
IT* to the ARI associated with the LMI (7.26), and hence to the LMI (7.26) itself. 


By using the discrete-time Riccati equation associated with the LMI (7.26), we 
readily derive the following algorithm. 


Algorithm 2 Compute the solution of the discrete-time Riccati equation 
Qe = AQ,AT + (CT — ADQ,C™)(A(O) — CQQCT)-1 (EC — CQ, A™) (7.62) 
with the initial condition 29 = 0 to get 2. = jim QQ,. Then, the minimum JT, in 
oo 
* and the associated covariance matrices are given by 


Ty = Qoo, Q. = I, — Alf, AT 
Sf=C SARC... aA0 Cie? 


It can be shown from Lemma 7.2 that the limit 2... is positive definite. 


Remark 7.2. The discrete-time Riccati equation in Algorithm 2 is the same as the 
discrete-time Riccati equation of (5.62). It is not difficult to show that 


Op Cet = he) 
satisfies (7.62) [see Problem 7.6], where 
Ch = lee ACT ... Ar tg?) 


Also, (7.62) is a recursive algorithm that computes the minimum solution of the ARE 
(7.37). Hence, the solution 7, of Algorithm 2 gives the minimum solution to the ARI 
associated with the LMI, so that J/, is the minimum solution to the LMI (7.26). 
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The existence of maximum and minimum solutions of the LMI (7.26) has 
been proved and their computational methods are established. This implies that the 
stochastic minimal realization problem stated by Faurre is now completely solved. 
The most difficult task in the above procedure is, however, to obtain a minimal real- 
ization (A, C, C") by the deterministic realization algorithm. In Chapter 8, we shall 
show that this difficulty is resolved by means of the approach based on the canonical 
correlation analysis (CCA). 


7.5 Algebraic Riccati-like Equations 


We derive algebraic Riccati-like equations satisfied by the difference O := I7* — IT, 
between the maximum and minimum solutions. Lemmas in this section are useful 
for proving Theorem 7.5 below. 

Consider the ARE of (7.37): 


I = ATTA’ + (CT — AC™)(A(0) — CHC™)-“\(C— CHA") (7.63) 


It should be noted that this ARE has the same form as the ARE of (5.75), where the 
minimum solution JT, equals ¥’ of (5.75). 
Let the stationary Kalman gain be defined by 


KS (OS Are AO) =Cne*) (7.64) 


and let Ax := A — KC be the closed-loop matrix. Then, we have the following 
lemma. 


Lemma 7.5. The ARE of (7.63) is expressed as 


Tl = Ax IA — KA(0)K7 + KC+C7KT (7.65) 


Proof. Use (7.64) and the definition of Ax. See Problem 7.7. 


Let /7* and JT, be the maximum and minimum solutions of (7.63), respectively. 
Thus, in terms of these solutions, we define 


= (Ct =4Ar oc jag Ser oC) = 
Kye= (6° =ANC"A0)=Ca.c yy 
and A* := A — K*C, A, := A—K,C. It then follows from Lemma 7.5 that 
HPS At AY! IC AO Ry PCs Ck) (7.66) 
T= AMAL KAR) FRC RS (7.67) 


The following lemma derives the Riccati-like equations satisfied by the differ- 
ence between the maximum and the minimum solutions of the ARE (7.63). 
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Lemma 7.6. Let O := IT* — II, Then, O satisfies the following algebraic Riccati- 
like equations 


@ = A,OA! + (K, — K*)(A(0) —Cl*C")(K, — K*)* (7.68) 


and 


@ = A,OA! + A,OC™(A(0) — CO—*CT)-1Co@Al (7.69) 
Proof. Since A* = A, + (K, — K*)C, it follows from (7.66) that 
II* = (A, + (K, — K*)C)H"(A, + (K, — K*)c)" 
— K*A(0)(K*)’ + K*C+C7(k*)* 
SATPAr eh CIVALEA MCR ky 
+ (K, — K*)CI"*C"(K, — K*)" — K* A(0)(K*)* 
+ K*O+07(K*)" (7.70) 
Also, from A, = A— K,C, 
A,II*C™ = AlI*C™ — K,CII*C* 
=C'=K (AO) 2 Cro") =Kcwrc 
SiG 2 (k, SO Ore” Sik 10) (7.71) 
Thus, using (7.71), we see that (7.70) becomes 
II* = A, lI" Al — (K, — K*)Cl"*C" (K, — K*)' + K,C6 + 07(K,)* 
+ K*A(0)(K*)" — K.A(0)(K*)? — K*A(O)(K)" 
Taking the difference between the above equation and (7.67) gives 
@ = A,OA! — (K, — K*)CII*C"(K, — K*)" 
+ K*A(0)(K*)" — K,A(0)(K*)? — K*A(0)(K)" + K.A(O)(K)" 
Rearranging the terms yields (7.68). Similarly to the derivation of (7.71), we have 
ATC SAK, CN,C SC =, AO) (7:72) 
Taking the difference between (7.71) and (7.72) yields 


AOC? = AMT” =, jC’ SG EA) Circ) 


Applying this relation to (7.68) leads to (7.69). 


Lemma 7.7. /f © is nonsingular, the inverse O~' satisfies 


G1 = Alo“ AC AG) =Ci1,c!y-'¢ (7.73) 
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Proof. From (7.69), we see that if © is nonsingular, so is A,. Thus it follows that 
A;'@(AT)-! = 0 + OCT(A(0) — C,CT — C@C™)-1Ce 

Application of the matrix inversion lemma of (5.10) gives 


ALO A207 = CAO) C70") 


This completes the proof of (7.73). 


We shall consider the strictly positive real conditions and a related theorem due 
to Faurre [47] in the next section. 


7.6 Strictly Positive Real Conditions 


In this section, we show equivalent conditions for strict positive realness, which will 
be used for proving some results related to reduced stochastic realization. In the 
following, we assume that there exist the maximum and minimum solutions /7* and 
IT,, of the ARE (7.63). 


Definition 7.3. (Faurre [47]) For the minimum solution IT,, we define 
Op= = ATLAS, SSC ATC. RAH AG) = CTC 
Suppose that the following inequality 


R, := A(0) —C,C™ > 0 (7.74) 


holds. Then, the stochastic realization problem is called regular. 


Theorem 7.5. Let (A,C,C*) be a minimal realization. Then the following (i) ~ 
(iv) are equivalent conditions for strictly positive realness. 
(i) Z(z) is strictly positive real, or T. is coercive. 
(ii) O = IT* — II, is positive definite. 
(iii) R, > 0, and A, := A—S,Rz'C is stable. 


(iv) The interior of P is non-void. In other words, there exists IT € P such that the 
corresponding covariance matrices are positive definite, i.e., 


Qs 
| st n| >? 
Proof. 1° (i) > (ii). From (7.44), there exists p > 0 such that the operator T, — pI 
corresponding to (A, C, C’, A(0)—pJ,) is positive real. Let P,, be the set of solutions 


of the LMI with (A, C, C, A(0) — pI,). Then, we see that P, C P. Let Ip € P,, 
and define 
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Og 2p =AnGA), Sh=C" HAC. HHA = Che 
It follows from Lemma 7.4 and (7.49) that 
E ee S t 
eTI€ = Elo + min = {eo HO eee allen) 
+p wr equte } 
The constraint equation for the above optimization problem is given by 


Et+1I)=ATEH +C%u), E() =&, — E(-00) = 


Qo So 
Then, from ee Ri > 0, we see that 
at 
TUT* —IIp)€ > p min ul (t)u(t T5 
er o> p min Dy wroule (7.75) 


Referring to the derivation of Algorithm 1 in Subsection 7.4.2, we observe that the 
optimal solution to the minimization problem in (7.75) becomes 
-1 
min S> ul (t)u(t)=e™(07O) 'E>0, EF 0 (7.76) 
t 


UcS(é 


where, since A is stable and (C,, A) is observable, 
OT0 = 5° (AT/CTCA' > 0 
i=0 
Therefore we have 
ETOE > ETI — Ip)E > pE™(OTO) E> 0, = EF 


This completes the proof of (ii). 

2° Next we show (ii) — (iii). For J7* and IT,, we define R* = A(0) — CH*C™ 
and R, = A(0) — CII,C™. Suppose that I7* — IT, > 0 holds, but R, is not positive 
definite. Then there exists 7 € IR? such that R.7 = 0, 7 4 0. Hence, 


(A(0) — CH,C™)n =0 


holds. Noting that A(0) > 0, we have C'n 4 0. Since R, > R* > 0, it also follows 
that R*7 = 0. Thus we have 


(RoR )qg=0- = <= )C 7 S0 
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However, since CTn # 0, we see that [7* — IT, is not positive definite, a contradic- 
tion. Thus we have R,, > 0. It follows from Lemma 7.7 that if O = I7* — I, > 0, 
the inverse O~! satisfies 


Ol = ATO A, SE C'AO)=C0i,C™)"C (7.77) 


Since (C, A,) is observable, the Lyapunov theorem implies that if @~! > 0, A, is 
stable. 

3° We prove (iii) > (iv); this part is somewhat involved. By the hypothesis, A, 
is stable and R, > 0. Let V > 0,V € R”*”, and consider the Lyapunov equation 


X= AK AG (0) C10) 6 eV (7.78) 


Obviously, we have X > 0, and hence X —! exists. In fact, even if V = 0, we have 
X > 0 due to the observability of (C, Ax). 
We derive the equation satisfied by the inverse X—!. From (7.78), we get 


X= CTA) CIC) 6 SAX ALS 


Applying the matrix inversion lemma of (5.10) to both sides of the above equation 
yields 


Ke XO (A) = Clty + XC") ex 
= Aa x PA TS Aeon TAS TCA LAC aoe 1) TATTX rae 


where A, is assumed to be nonsingular. Pre-multiplying the above equation by A,, 
and post-multiplying by Al, we have 


A,X AT+A,X—!C7 (A(0) — C[H, + X07) 71¢0x1 al 
=X a(t KAYAK) 
Thus the inverse X—! satisfies 
ee A aX CEO) = CU eA Te Gx AS 
+(X+XA,V1ATX)“! (7.79) 


Note that if V — 0, then it follows that ¥-! 4 © = IT* — IT,, and hence the above 
equation reduces to (7.69). 
Now define 7 := II, + X~—!. Then the covariance matrices associated with [7 
are given by 
O= Pkt SAT EM YAP 
S =C™ — Ail, + X~")CT 
R= A(0) —C(H, + X—")ct 
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We show that (Q, S, R) satisfy the condition (iv). If V > I-00, then X~! — 0, so 
that for a sufficiently large V, we get R > 0. It thus suffices to show that 


—Ric(I7) :=Q-—SR'S™>0 
Since JT, satisfies the ARE of (7.63), and since A = A, + K,C, we have 
Oh ATA) Oe SA) 
VK AAO) a CIC AO = CAE Oe A BC) 
Moreover, from (7.79), it can be shown that 
Q = K,(A(0) — C,C™)K? + A,X—10TR'CxX—1Al 
PX AY PAR SAG OO LK hg x -oAs 
HOOK OLR! 
=K RR! PAC CR Ox AL 
eA A eS a Oe Se ae 
S(GRoAX OCR NK = AX CTE 
+(X +XA,V-1ATX)“! (7.80) 
Also, by utilizing (7.72), we have 
S=C =A ROG ex Ct 
= KAO) AGO! = RCC! {KC xX Co 
= K,R—A,X~'c™ 
and hence 
SRS! = (KR AX COR KR Ax CE 
Subtracting the above equation from (7.80) yields 
GSR Ss SX + XAV ALY) SO 


This completes a proof of (iii) > (iv). 
4° We finally prove (iv) + (i). From the assumption, there exists an interior 
point 7) € P and p > 0 such that 


Qo So Qo So 
| i > plrsp > Eee >0 


Since IT is a solution of the LMI with (A, C, C, A(0) — pl,), we see that T, is 
coercive. 
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We present without proof a lemma related to Theorem 7.5 (ii), which will be used 
in Section 8.6. 


Lemma 7.8. Suppose that A(0) > 0 holds in Z(z) in (7.11), but we do not assume 


that (A, C, C™) is minimal. Then, if the LMI of (7.26) has two positive definite 
solutions IT, and ITy, and if IT, — II, > 0, then Z(z) is strictly positive real. 


Proof. If (A, C, C7?) is minimal, the result is obvious from Theorem 7.5. A proof 
of the non-minimal case is reduced to the minimal case; see [106]. 


7.7 Stochastic Realization Algorithm 


By using the deterministic realization algorithm of Lemma 6.1, we have the follow- 
ing stochastic realization algorithm. 


Lemma 7.9. (Stochastic realization algorithm [15]) 


Step I: For given covariance matrices {A(l), 1 = 0,1, +--+, L}, we form the 
block Hankel matrix 
A(1) A(2) A(k 
we (AO) AB AGED | apap 


where 2k —1< Landk > n. 
Step 2: Compute the SVD of Hy, such that 


Fe Te ie 0 | Ya 


as ve | ~U,5,V2 (7.81) 


where 3), contains the largest n singular values of Hy,,, and the other singular 
values are small, i.€., 01 > 02 > +++ > On > On41 > +. 


Step 3: Based on the SVD of (7.81), the extended observability and reachability 
matrices are defined by 


(= ees) (7.82) 
Step 4: Compute the matrices A, C, C™ by 
A=! _,On,, C=Ox(1:p,1:n), CT =€,(1:n,1:p) (7.83) 


where O, = On(p+1:kp,1:n). 
Step 5: By using (A, C, C, A(0)) so obtained, we define the ARE 


TT = AMA™ + (C? — AlC™)(A(0) — COC™)-“\(C- CA") (7.84) 
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Compute the minimum (or stabilizing) solution IT, > 0 by the method described in 
Subsection 7.4.2, or by the method of Lemma 5.14, to obtain the Kalman gain 


Re = AMC \(A0)=Cn,c (7.85) 

Then we have an innovation model 
a(t+1) = Ax(t) + Ke(t) (7.86a) 
y(t) = Ca(t) + e(t) (7.86b) 


where cov{e(t)} = A(0) — CIT,C™. 


Since the covariance matrix of the innovation process e of (7.86) is not a unit 
matrix, note that the Markov model of (7.86) is different from the Markov model of 
(7.38). In fact, K in (7.86) is expressed as K = BD7' using B and D in (7.38). 

A crucial problem in this algorithm is how we can compute accurate estimates 
of covariance matrices based on given finite measured data. To get good estimates, 
we need a large amount of data. If the accuracy of estimates of covariance matrices 
is lost, then data (A, C, C, A(0)) may not be positive real, and hence there may 
be a possibility that there exist no stabilizing solutions for the ARE of (7.84); see 
[58, 106, 154]. 


7.8 Notes and References 


e By using the deterministic realization theory together with the LMI and AREs, 
Faurre [45-47] has developed a complete theory of stochastic realization. Other 
relevant references in this chapter are Aoki [15], Van Overschee and De Moor 
[163, 165] and Lindquist and Picci [106, 107]. 


e In Section 7.1, as preliminaries, we have introduced the covariance matrices and 
spectral density matrices of a stationary process, and positive real matrices. In 
Section 7.2, we have defined the problem of stochastic realization for a stationary 
process based on [46]. 


e In Section 7.3, by using the results of [46, 107], we have shown that the stochas- 
tic realization problem can be solved by means of the LMI and associated ARI 
and ARE. Also, some simple numerical examples are included to illustrate the 
procedure of stochastic realization, including solutions of the associated ARIs, 
spectral factors and innovation models. 


e Section 7.4 deals with the positivity of covariance data and the existence of 
Markov models. By using the fact that there exists a one-to-one correspondence 
between a solution of LMI and a Markov model, we have shown that the set 
of all solutions of the LMI is a closed bounded convex set, and there exist the 
minimum and maximum elements in it under the assumption that the covariance 
data is positive real; the proofs are based on the solutions of related optimal con- 
trol problem due to Faurre [45,47]. Also, two recursive methods to compute the 
maximum and minimum solutions of the ARE are provided. 
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e In Section 7.5, we have introduced algebraic Riccati-like equations satisfied by 
the difference of the maximum and minimum solutions of the LMI, together with 
proofs. Section 7.6 provides some equivalent conditions such that the given data 
are strictly positive real. A different proof of the positive real lemma based on 
convexity theory is found in [135]. 


e In Section 7.7, a stochastic subspace identification algorithm is presented by use 
of the deterministic realization algorithm of Lemma 6.1. A program of this algo- 
rithm is provided in Table D.3. However, there is a possibility that this algorithm 
does not work since the estimated finite covariance sequence may not be positive 
real; see [38, 106] for details. In Section 7.10, a proof of Lemma 7.4 is included. 


7.9 Problems 
7.1 Let Z(z) = B(z)/A(z), and let 
A(e3”) := a(w) + jb(w), Bie!) := c(w) + jd(w) 


Derive a condition such that Z(z) is positive real in terms of a(w), b(w), c(w), 
d(w). Also, derive a positive real condition for the function 


bz+c¢ 
AG) z+a 


7.2 Find the condition such that the second-order transfer function 
A(z) =1+ 4,271 + a227? 
is positive real. 
7.3 Find the condition such that 
1 1 1 1 
~ A(z) 2 L+ajz-1+a2z-2 2 
is positive real. Note that this condition appears in the convergence analysis of 
the recursive extended least-squares algorithm for ARMA models [109, 145]. 
7.4 Prove the following estimate for the matrix norm. 
AB 
CD 


| <All + Bll + Cl + (DI 


From this estimate, we can prove the continuity of /(-) of Lemma 7.3. 
7.5 Let A = 1/3,C = C = V2/3, A(0) = 2/3. Show that the LMI of (7.26) is 
given by 


S5 een) 
BV Fp : 1 ; : 20 
Sy ig) a) 


Compute the spectral factors corresponding to /7, and [1*. 
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7.6 Show that Q, = €,T—'(k)C} satisfies (7.62). 
7.7 Derive (7.65) from (7.63). 
7.8 Solve the optimal control problem (7.76) in the proof of Theorem 7.5. 


7.10 Appendix: Proof of Lemma 7.4 


Define the positive definite function V (t) = €" (t) 7£(t) and compute the difference. 
It follows from (7.48) that 


VE+1) —V(t) = [E*MA+u* OCIA’ EC) + CTu(t)] — €7 MEM) 
= €T(t)(AM A? — M)é(t) + ut (t)CHC* u(t) 
+ €T(t)ATC u(t) + ul (t)CIATE() 
Moreover, from (7.21), 
Vitt+ 1) -—V@) = -€*HQEE) + u*(H[AO) — Rhu) 
+ EM(H)(C™ — S)u(t) +u* (t)(C — S*)E(t) 
Eo) 


( 
= —[é*(t) "wi 8 Ae 4 
+ ur (t)A(O)u(t) + E"()CT u(t) + u* (CE) 
Taking the sum of the both sides of the above equation over (—oo, —1] yields 


+ Leo eS al [0] 


t=—oo 


S- uT@)AOul) + S> TWETU) + SD uk HCE) 


t=—0co t=—oco t=—oco 


>, + 1h +13 


where we have used the boundary conditions V(0) = €7 (0) I7€(0) and V(—o«) = 0. 
t-1 


Since, from (7.48), €(t) = S- (AT)“1-*CT u(k), we see that 


k=—0co 


= s eT (t)CT ul is 3 ul (k)CA'*1 CT u(t) 


t=—oco t=—co k=—oo 
-1 t-1 


= YY ua (HAG- ult) 


t=—oco k=—oco 
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Also, noting that 73 = IJ holds, we change the order of sums in the above equation, 
and then interchange ¢ and k to get 


R= y y ul (t)AT(t — k)u(k) 


t=—co k=—oco 
= y y ul (t)A(k — t)u(k) 


II 
M 


S> ul (k) A(t — k)u(t) (7.87) 


t=—oo k=t+1 


The last equality is due to the fact that for ¢ = —1, the second sum aaa 41 becomes 
void. Hence, it follows that 
-1 t-1 
h+th+h= > (v9.40 + So ul (RACE =) 
t=—0o k=—0o 
-1 
+ So ub (k)A(t- ») u(t) 
k=t+1 
Se 227 
= S> ul (k)A(t— k)u(t) = aT Tye 
t=—0co k=—oo 


This completes the proof. 
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Stochastic Realization Theory (2) 


This chapter presents the stochastic realization theory due to Akaike [2,3]. First, we 
briefly review the method of canonical correlation analysis (CCA). We define the 
future and past spaces of a stationary stochastic process, and introduce two predic- 
tor spaces in terms of the orthogonal projection of the future onto the past, and vice 
versa. Based on these predictor spaces, we derive forward and backward innovation 
representations of a stationary process. We also discuss a stochastic balanced realiza- 
tion problem based on the CCA, including a model reduction of stochastic systems. 
Finally, presented are subspace algorithms to obtain stochastic state space models 
based on finite observed data. Some numerical results are included. 


8.1 Canonical Correlation Analysis 


The canonical correlation analysis (CCA) is a technique of multivariate statistical 
analysis that clarifies the mutual dependence between two sets of variables by finding 
a new coordinate system in the space of each set of variables. 

Let & and y be two vectors of zero mean random variables defined by 


eal Y1 

r2 k Y2 1 
z=|]./ER, y=]. {ER 

Xk Yl 


Let the linear spaces spanned by z and y respectively be given by 
X = span{ay, +--+ , rx}, Y = span{yi, ++: , x1} 


First, we find vectors w, € X and z, € Y with the maximum mutual correlation, and 
define (w1, 21) as the first coordinates in the new system. Then we find wz € X and 
za € Y such that their correlation is maximum under the assumption that they are 
uncorrelated with the first coordinates (w1, 21). This procedure is continued until 
two new coordinate systems are determined. 
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Let the covariance matrices of two vectors x and y be given by 


flyer or Mow Se | (k4L) x (K+) 
S=B = Y) ER 8.1 
tL] & f= [soe eo 


where it is assumed for simplicity that ',,, > 0 and X',, > 0. Also, without loss of 
generality, we assume that k < 1. We define two scalar variables 


k i 
T T 
wi=a 2) Qi X;, z2=0b >, BY; 
i=l j=l 


by using two vectors a € R* and b € R’, respectively. We wish to find the vectors a 
and that maximize the correlation between w, and z;, which is expressed as 


cov{ata, bt y} ae 


/cov{atz}/cov{bTy} (a Beet) (B" Saal) 


Note that if a pair (a, 6) maximizes p, then the pair (c,a, c2b) also maximizes p for 
all non-zero scalars c;, cz. Thus, we impose the following conditions 


eReeat po baa (8.2) 


The problem of maximizing p under the constraint of (8.2) is solved by means of 
the Lagrange method. Let the Lagrangian be given by 


1 1 
La Dyyb+ yu(l - a! Drea) + antl b1 Dyyb) 


Then, the optimality conditions satisfied by the vectors a and b are 


= = Syyb — \Ze20 = 0, ce = Syea—A2Syyb=0 (8.3) 


Pre-multiplying the first equation of (8.3) by a? and the second by b* and using 
(8.2), we have 
aS bSb S36 = n= Ng 


Letting 41 = Az = p, it follows from (8.3) that 
Sayb — pXiz2a = 0, Syed — pXiyyb = 0 (8.4) 
Since 5’, > 0, we can eliminate 6 from the above equations to get 
(MeyDyy Dye — P’Lexla=0, af0 (8.5) 


This is a GEP since ¥),4 £ I. 
We see that a necessary and sufficient condition that a has a non-trivial solution 
is given by 
det(XnyU7,' Nye — p’ Nex) =0 (8.6) 
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where this is a kth-order polynomial in p? since det(©,.) 4 0. Let square root 
matrices of X’,, and 3’, respectively be huh and si satisfying 


= 1/2 pT/2 a 1/2 ypT/2 
Sug ee Sy = ad 


Te ? 


It therefore follows from (8.6) that 


det (Dey)? Lay Diy? Digg! Lye Dag!” — pe) = 0 


—1/2 


Define = := 57,/ Sate 


Dey Zyy / € R**!. Then, we have 
det(S57 — p?I,) =0 (8.7) 


This implies that p? is an eigenvalue of == € R***, and that k eigenvalues of 
= ET are nonnegative. Let p) > p2 > --- > pz > 0 be the positive square roots 
of the eigenvalues of 57, and let ay, az, --+ , a, € R” be the corresponding 
eigenvectors obtained from (8.5). Then, we define the matrix 


L=[a, ay +++ ax] € R*** 
Similarly, eliminating a from (8.4) yields 
(yeh yp yg =U; b#0 
and hence 
det Op Mo Mey ep Pg 0 (8.8) 


Since Y,, > 0, (8.8) is equivalent to det(=1 5 — p?I) = 0. Since ZTE € R'! is 
nonnegative definite, it has / nonnegative eigenvalues. Let pj > pz > ++: > pi > 
0 be the positive square roots of the eigenvalues of ='=!. Let the corresponding 
eigenvectors be given by b;, bz, --- , b; € R’, and define the matrix 


M = [bi by ++: bj] € R*! 


Definition 8.1. The maximum correlation pj is called the first canonical correlation. 
In terms of corresponding two vectors a1 and b;, we have two scalars 


T T 
W1 =a; 2, zyi= bry 
These variables are called the first canonical variables. Similarly, p; is called the ith 
canonical correlation, and w; = a} x and z; = b} y are the ith canonical variables. 


Also, two vectors w = Lx and z = Mty are called canonical vectors. 


The following lemma shows that L and M are the square root inverses of the 
covariance matrices 5, € R*** and Syy € R’*!, respectively. 


'Since nonzero eigenvalues of S27 and S7& are equal, we use the same symbol for 
them. 
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Lemma 8.1. Let EL and M be defined as above. Then, we have 


LD Sgnhathe MDM at (8.9) 


P1 
p2 
LMS pS ; 0 en! (8.10) 


Pk 
where 1 > p, >-+:- > pr > 0. 


Proof. We prove (8.9) under the assumption that p; # p; fori # 7. Let (ai, bi) 
and (a;, 6;) be pairs of eigenvectors corresponding to p; and p;, respectively. From 
(8.5), we have 


—1 eee) —1 — p72 
Lay yy Diya Qi = Pj VaaQi; Lay Vyy SygQj = iF Sead; 


Pre-multiplying the first and the second equations by at and a}, respectively, and 
subtracting both sides of resulting equations yield 


(9; — pj)a; Se2ai=0, i Fj 


Thus we see that a} Dai = 0,71 # 7. In view of (8.2), this fact implies that 
L™Y..D = I,. We can also prove M™Sy4M = I, by using 6; and 6;. 
It follows from (8.4) that Y7,4b; = p;X',2a;. Pre-multiplying this equation by ar 
yields 
A} Deybi = pid; Urxi = pi 


Similarly, pre-multiplying 1.40; = pj; X22a; by a} (j 4 1) gives 
a; Uaybs= pias Bear —=—0y a xa 


These equations prove (8.10). 
Finally we show that p? < 1. Let @ be a scalar, and consider 


det (02.2 Ore ey le Ey2)) 9 
Since X’, > 0 and X),, — May ag Liye > 0, we get 9 > 0. This can be proved 
by using the technique of simultaneous diagonalization of two nonnegative definite 


matrices. Thus, we have 


det ((1- 0) Ene — Eny yy Eye) = 0 


Comparing this with (8.6) gives 9 = 1 — p? > 0, and hence p? < 1. 
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Let w = L'a and z = MT y. Then, we see from Lemma 8.1 that 
E{ww'} =I, E{zz™ =], 


and 
P1 


E{wz'}=D= : 0 (8.11) 


Pk 
The elements of canonical vectors w and z, which are respectively obtained by linear 
transforms of x and y, are white noises with mean zero and unit variance, and they 
are arranged in descending order of mutual correlations as shown in Table 8.1. Thus, 
both whitening and correlating two vectors can be performed by the CCA. 


Table 8.1. Canonical correlation analysis 


We see from (8.7) that the canonical correlations p; > p2 > +: > px are the 
singular values of =, so that they are computed as follows. 


Lemma 8.2. Suppose that the covariance matrices of x and y are given by (8.1). 
Then, the canonical correlations are computed by the SVD 


220 Syl SU (8.12) 
where D is defined by (8.11). Also, the canonical vectors are given by 
w=L'zs= Sp aac oS My = Vis y 


Proof. It follows from (8.12) that (UT Siz/?)Spy(Nyy/°V) = D. Comparing 
this with (8.10) gives the desired results. 


8.2 Stochastic Realization Problem 


We consider the same stochastic realization problem treated in Chapter 7. Suppose 
that {y(t), ¢ = 0, +1, ---} is a regular full rank p-dimensional stationary process. 
We assume that the mean of y is zero and the covariance matrix is given by 
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A(t) = E{y(t+Dy™(@)}, 1 =0,+1,--- (8.13) 


Suppose that the covariance matrices satisfy the summability condition 


YE AMI < (8.14) 


l=—co 


Then, the spectral density matrix of y is defined by 
B(z)= So Az (8.15) 


Given the covariance matrices (or equivalently the spectral density matrix) of a 
stationary process y, the stochastic realization problem is to find a Markov model of 
the form 


a(t +1) = Aax(t) + w(t) (8.16a) 
y(t) = Ca(t) + v(t) (8.16b) 


where x € R” is a state vector, and w € R” and v € R? are white noises with mean 
zero and covariance matrices 


e{{ai] mo vw}=[S8]& am 


In the following, it is assumed that we have an infinite sequence of data, from which 
we can compute the true covariance matrices. 
Let t be the present time. Define infinite dimensional future and past vectors 


ay wt 3) 
t+1 t—2 
r= |, pm |, 


Then, the cross-covariance matrix of the future and past is given by 
fae A(2) A(3) +: i 


A(2) A(3) A(4) - 
H = E{f(t)p" ()} = | A(3) A(4) A(5) + (8.18) 


and the covariance matrices of the future and the past are respectively given by 


A(0) AT(1) AT(2) -- 
ai A(1) A(O) AT(1) -- 
Ty = E{fOF OF = | AQ) AQ ) A(0) t (8.19) 
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and 


(2) ---] 


( ee 
T_ = E{p(t)p*(#)} = | AT(2) AT(1) A(0) - (8.20) 


It should be noted that H is an infinite block Hankel matrix, and T are infinite block 
Toeplitz matrices. 

Let Y = span{y(t) | ¢ = 0, +1, ---} be a Hilbert space generated by all the 
linear functionals of the second-order stationary stochastic process y. Let Y;° and Y; 
respectively be linear spaces generated by the future f(t) and the past p(t), i.e., 


ee = span{y(t), y(t + 1), pee hi Op = Span{y(t =} 1), y(t = 2), bea y 


We assume that these spaces are closed with respect to the mean-square norm, so 
that Yr and Y, are subspaces of the Hilbert space Y. 


8.3 Akaike’s Method 


In this section, we shall study the CCA-based stochastic realization method due to 
Akaike [2, 3]. 


8.3.1 Predictor Spaces 


A necessary and sufficient condition that y has a finite dimensional stochastic real- 
ization is that the Hankel matrix of (8.18) has a finite rank, i.e., rank (H) < oo. In 
order to show this fact by means of the CCA technique, we begin with the definition 
of forward and backward predictor spaces. 


Definition 8.2. Let the orthogonal projection of the future Y; onto the past Y; be 
defined by 


Xe = BY! | Yr} = span y(t +h) | Ye} | h=0,1,---} 
= span{ g(t +h |t-)|h=0, 1, vt 
And, let the orthogonal projection of the past Y; onto the future Y} be given by 
Xe = EYy | Yt} = span B{yt—) | yi} |l=1,2,--4 
= span{ g(t —1| t+) |= 1, 2, a} 


Then the spaces X;, and X; are called the forward and the backward predictor spaces, 
respectively. 
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The generators g(¢ + h | t—) of the forward predictor space are the minimum 
variance estimates of the future y(¢+h), h = 0, 1, --- based on the past Y; , and the 
generators 7(¢ — 1 | t+) of the backward predictor space are the minimum variance 
estimates of the past y(t — 1), 1 = 1, 2, --- based on the future Y;. The notations 
g(t + h | t—) and g(t — 1 | t+) are used in this section only; in fact, the optimal 
forward estimates should be written as G(¢ + h | t — 1) by using the notation defined 
in Chapter 5. 

The optimality conditions for the forward and backward estimates are that the re- 
spective estimation errors are orthogonal to the data spaces Y; and Y;', respectively. 
Thus the optimality conditions are expressed as 


B{ [y(t +h) ~ g(t +h| t-)]y7e- 9} =0 
and 

B{ [yt x(t —1| t+)]y7e+ hy} =0 
where h = 0, 1,--- and/=1, 2,---. 


Lemma 8.3. Suppose that rank(H) < oo. Then, the two predictor spaces X, and 
X4 are finite dimensional, and are respectively written as 


Xz = span{g(+ h | t-)|h = 0, 1,-+,r—1} 


and 
Xs = span{ g(t = 1 t+) [= 1, De a} 
where r is a positive integer determined by the factorization of given covariance 
matrices {A(k), k = 1, 2,---}. 
Proof. Since rank(H) < oo, the covariance matrix A(k) has a factorization given 


by (7.7). Thus it follows from Theorem 3.13 that there exist an integer r > O and 
scalars a1, ++ , a» € R such that 


A(r+k)+ Sl aA(r+k-i)=0, k=1,2,--- (8.21) 
i=1 


This is a set of linear equations satisfied by the covariance matrices. From the defi- 
nition of covariance matrices, it can be shown that (8.21) is rewritten as 


B{ [uttr+h) + Daylt+r+n—d]yre—} =0 (8.22) 
and P 
B{ [yt—r-D +S ay(t—r—1+ad|yre+ ny} =0 (8.23) 


where h = 0,1,--- and 1 =1, 2, ---. We see that (8.22) is equivalent to 
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B{ylt+r+h)+ Draylt+r+h—d| yr} =o, He 0.4, 3h 
i=1 
so that we have 
(irene eget, h=0,1,++» (8.24) 


i=1 


Similarly, from (8.23), 
B{y(t—r -) + Daiy(t-r -1-4) | yf} =0, 1=1,2,--- 
i=l 


This implies that 


Tr 


g(t—r—1|t+)=-Soagyt—r—-lt+ilt+), 1=1,2,--- (8.25) 


i=1 


From (8.24) and (8.25), we see that the predictor spaces X; = E{yt | Y, } and 
ee E{yy | ¥; } are finite dimensional, and the former is generated by the forward 
predictors g(t + h | t—) = E{y(t+h) | Yep },h =0,1,-+-, r—1, and the latter 
the backward predictors 7(t — 1. | t+) = Ef{y(t —) | YF} =1,2,---,r 


In the above lemma, the positive integer r may not be minimum. But, applying 
the CCA described in Section 8.1 to the following two vectors 


g(t | t-) g(t —1| t+) 
| g@+1|t-) _ _ | gt—2| t+) 
Ce Cees 


we obtain the minimal dimensional orthonormal basis vectors x(t) and %(t) for the 
predictor spaces X; and X;, respectively. Being orthonormal, we have cov{x(t)} = 
I, = cov{«(t)}. It should be noted that since y(t) is a stationary process, x(t) and 
%(t), the orthonormal bases of X; and X;, are jointly stationary. 

For the transition from time ¢ to t + 1, we see that the predictor space evolves 
from X; to Xi41 = E(YH | Yini}- Since Yi, = Ye V span{y(t)}, the space 
Ji41 has the orthogonal decomposition 


Yiu = Ye @ span{g(t)} (8.26) 


t—) is the forward innovation for y(t). Thus, it follows 


oo 


where j(t) -= y(t) — 9 
that 


Keo = E{Yis | Ye} = BLY. | Ye @ span{g@}} 
= B(YA. | Ye} + E{Yhs | span{g}} 
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Since Yj, C Yj, we see that E{¥i, | Ye } C Xe. Hence, the first term in the 
right-hand side of the above equation gets smaller than X;, but by the addition of 
new information g(t), a transition from X;, to X41 is made. 


Definition 8.3. [105] Suppose that a subspace 8; (C Y; ) satisfies 


E{¥f | 8} = E(t | Yr} 


Then, 8, is called a splitting subspace for (Y;", Y; ). 


Lemma 8.4. The predictor space X; = E{yt | ¥, } isa minimal splitting subspace 
for (UF, Ur). 

Proof. We note that X; = E{X, | Xi} and X, C Y; hold. It thus follows from the 
property of orthogonal projection that 


EY? | Ye} = Xe = B{Ke | Xe} = ELBLYS | ve} | Xe} 
= B{yt | Xs} 


The last equality implies that X; is a splitting subspace for (Y;", Y; ). Also, it can be 
shown that if a subspace S$; of Y; satisfies 


E(Yf | 8} = EYP | Yr} 


then we have 8; > X;. Hence, X; is the minimal splitting subspace for (y; ee ys 


This lemma shows that X contains the minimal necessary information to predict 
the future of the output y based on the past Y, . We can also show that the backward 
predictor space X, = E{Y7 | Yj} is the minimal splitting subspace for (Y7, Y;'), 
contained in the future Y;". Thus, two predictor spaces defined above can be viewed 
as basic interfaces between the past and the future in stochastic systems. It will be 
shown that we can derive a Markov model for the stationary process y(t) by using 
either &(t) or x(t). 


8.3.2 Markovian Representations 


The stochastic realization technique due to Faurre considered in Section 7.3 is based 
on the deterministic realization method that computes (A, C, C, A(0)) from the 
given covariance matrices and then finds solutions (IT > 0, Q, R, S) of LMI(7.26). 
On the other hand, the method to be developed here is based on the CCA technique, 
so that it is completely different from that of Section 7.3. By deriving a basis vector 
of the predictor space by the CCA, we obtain a Markov model with a state vector 
given by the basis vector. 

We first derive a stochastic realization based on the basis vector %(t) € X;,. Recall 


that &(t) has zero mean and covariance matrix I,. 
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Theorem 8.1. Jn terms of the basis vector %(t) € Yj, a Markov model for the 
stationary process y is given by 


#(t + 1) = Ax(t) + w(t) (8.27a) 
y(t) = C#(t) + 0(t) (8.27b) 
where A € R"*", C € RP*", C € R°*” satisfy 
A = Efa(t + 1)z"(t)} 
C = E{y(t)z" (t)} (8.28) 
C = E{y(t)#™(t+1)} 


Also, wt and © are white noise vectors with mean zero and covariance matrices 


o{ [tere soi} = [$3] 


where cov{z(t)} = IT = I, and 
O=1,=-AA', S=C! =AC", R= AO) CC" (8.29) 


Proof. 1° Let the basis vector of X; be given by Z(t). Since X, = E{Yz | YF} Cc 
Yj’, we see that %(t) is included in Y7*. Also, we get &(t + 1) € ¥,, C Yj’. Hence, 
we can decompose #(¢ + 1) as the sum of orthogonal projections onto span{«(¢)} 
and its complementary space (span{#(t)})+ NY}, ie, 


&(t +1) = B{e(t+ 1) | &(t)} + Ef{e(t + 1) | (span{z(t)})+ nyt} 
The first term in the right-hand side of the above equation is expressed as 
{a(t + 1) | a(t)} = B{ee + Ya" ()}(B{ee" (O})7#e@) = Ae) 


Define w(t) := #(t + 1) — A(t). Then, w(t) € Y/', but w(t) L X;. We show 
that w(t) is orthogonal to Y,;. Let € € Y,. Then, we have E{€é | ur} E X;. Also, 
by definition, € — E{é | YF} 1 YF, and hence € — E{E | YF} 1 w(t). Since 
E{é | Yt} L w(t), we obtain € 1 w(t) for any € € Y>. This proves the desired 
result. 

Thus, w(t + 1) is orthogonal to Y,,,. Since ¥y C Yy,,, 1 = 0,1, ---, we see 
that w(t +1) L Yy holds. However, since w(t +1) € Yj’, it follows that w(t + 1) is 
orthogonal to %(t), implying that 


wt+l L z(t), y¢-1), [=0,1,-:- (8.30) 
2° Since y(t) € Y7 and X; C Y;, the output y(t) has a unique decomposition 


y(t) = E{y(t) | span{a(t)}} + E{y@) | (span{e()})* 9 yy} 
= Cx(t) + o(t) 
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This shows that (8.27b) holds. By the definition of &(t), we have o(t) € Yj and 
b(t) 1 X;. As in the proof of w(t) L YF in 1°, we can show that o(t) 1 Yy, and 
hence 0(¢ +h) 1 ¥>, h =0, 1,---. Moreover, since 6(¢ +h) € Yi, C YF holds, 
it follows that 4(t+h) is orthogonal to X, = E{Y7 | Yt }.ie, 0(t+h) 1 &(t), h= 
0, 1,---. This implies that 


o(t+1) Lz), yt-1), t=0,1,--- (8.31) 
3° We see from (8.30) that 
E{w(t+ Dw" (t)} = Efw(t + D[e(¢+1) — Ae@]7}=0, 1=1,2,-- 


This implies that w is a white noise. Also, it follows from (8.31) that 0 is a white 
noise. Again, from (8.30) and (8.31), we have 


E{o(t + Dw (t)} = Efo(t + D[e(t + 1) — Ax(t)]"} =0, f= 1,243 
and 
E{w(t+ Dot (t)} = Efw(t + Dly(t) — Cz(t)]"} = 0, 1=1,2,--- 


This shows that (2, ©) are jointly white noises. 
Finally, it can be shown from (8.28) that 


Q = Ef{f#(t +1) — Az(t)|[2(¢+ 1) — Az()]?} = 1, — AAT 
S = E{[#(t + 1) — Az(t)][y(t) - Cz()]"} = C* - ACT 
R= E{[y(t) — C#()|[y() — Cz()]"} = A(0) — CCT 


A 
Ak 


This completes the proof of (8.29). 


We now show that the orthogonal projection of (8.27a) onto Y;, , yields another 
Markov model for y. It should be noted that projecting the state vector of (8.27a) 
onto the past Y;,, is equivalent to constructing the stationary Kalman filter for the 
system described by (8.27). 

Since E{Y} | ¥>} = X; = span{2(t)}, we see from the proof of Lemma 8.4 
that 

EY | Ye} = E(YP |G} = E{YE | span{e(t) } 


Thus, noting that #(¢) € Y;', it follows that 
E{z(t) | Ye} = E{a(d) | ()} 
= E{a(t)x*(t)}(E{a()2* ()})*2(t) = Ta(t) (8.32) 


where Y = E{#(t)a*(t)} = diag(p1, p2,-** , Pn). 
The next theorem gives the second Markovian model due to Akaike [2]. 
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Theorem 8.2. In terms of z(t) = Yx(t) € Y;, a Markov model for y is given by 
2(t +1) = Az(t) + w(t) (8.33a) 
y(t) = Cz(t) + v(t) (8.33b) 
where the covariance matrices satisfy cov{z(t)} = IT = Y? and 
VeaPeArA, (SSC 2ArC BREAD aOrC? 834) 
Proof. Similarly to (8.32), we have 
E{e(t+ 1) | Yai} = Yet +) =2(t +1) 


By using the orthogonal decomposition Y;,, = Y; © span{g(t)} defined in (8.26), 
we project the right-hand side of (8.27a) onto Y7_, to get 


A 


z(t +1) = E{Az(t) + WE) | Jit 
= B{Aa(t) + w(t) | Yp @ span{g(t)}} 
+ w(t) | Yo} + Ef{e(t+ D | g(t} 


From (8.32) and the fact that w(t) L Yy, the first term in the right-hand side of the 
above equation becomes E'{ A&(t) + w(t) | ¥p } = Az(t). Defining the second term 
as w(t) := E{z(t + 1) | g(t)}, we have (8.33a). Since y(t) € Yia1 it has a unique 
decomposition 


+ 
= E{C#(t) + 0(t) | Ye ® span{g(t)}} 
+0(t) | Yr} + E{y | gO} 


where we see that E{(t) | Yp} = 0 and E{y(t) | g(t)} = G(t). Hence, defining 
y(t) = v(t), we have (8.33b). We can prove (8.34) similarly to (8.29). 


Since w(t) := E{%(t + 1) | 9(t)} belongs to the space spanned by v(t) = §(t), 
we have 


E{w(t) | v(t)} = Efw(t)v" (t)}(Efo(t)e? (Qo) = SR (t) 
Thus, by putting K = SR-', the Markov model of (8.33) is reduced to 
z2(t+1) = Az(t) + Kv(t) (8.35a) 
y(t) = Cz(t) + v(t) (8.35b) 
This is the stationary Kalman filter for the system described by (8.27), since 


a(t) = B{a(t) | Ye} = Ta(t) 
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is the one-step predicted estimate of the state vector Z(t). Also, the state covariance 
matrix of (8.35a) is given by E{z(t)z'(t)} = I = Y°, and the error covariance 
matrix is given by P := F{[z(t) — z(t)|[@@) —z(@]™} =1-7°?. 

We see that the Markov model of (8.27) is a forward model with the maximum 
state covariance matrix (IT = I,,) for given data (A, C, C, A(0)), while the forward 
Markov model of (8.33) has the minimum state covariance matrix (IT = T°). 


8.4 Canonical Correlations Between Future and Past 


In this section, we consider the canonical correlations between the future and the past 
of the stationary stochastic process y of (8.16). To this end, we recall two AREs asso- 
ciated with the stationary Kalman filter (5.75) and the stationary backward Kalman 
filter (5.88), ie., 


SADA + (CYANO VAG) — Cz") (CEC =0> 4") (8.36) 
and 
SA AS (CPS At re AG) ese) Wo= Ox A) (8.37) 


It is easy to see that the stabilizing solution »’ of (8.36) is equal to the minimum 
solution Q,, = IT,, which is computable by Algorithm 2 of Subsection 7.4.2, and 
hence we have »’ = J/7,,, the minimum solution of (7.37). But, the stabilizing solution 
» of (8.37) is equal to the minimum solution 2,, = (J7*)~1, which is computable 
by Algorithm 1, so that we have = (JI*)~!. Therefore, in terms of stabilizing 
solutions Y and 5’, the inequality of Theorem 7.4 is expressed as 


eT et 


In the following, we show that the square roots of eigenvalues of the product 557 
are the canonical correlations of the future and the past of the stationary process y. 
This is quite analogous to the fact that the Hankel singular values of a deterministic 
system (A, B,C) are given by the square roots of the eigenvalues of the product of 
the reachability and observability Gramians (see Section 3.8). 


Theorem 8.3. The canonical correlations of the future and the past of the station- 
ary process y are given by the square roots of eigenvalues of the product ©). If 
rank(H) = n, then the canonical correlations between the future and the past are 
given by (a1, “t85 On; 0, - ee 0). 

Proof. Define the finite future and past by 


y(t) y(t — 1) 
y(t + 1) y(t — 2) 
. ’ pr(t) = : 


jith=1) oe 


f(t) = 
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and also define Hy, := E{f,(t)p} (t)}, Ti (k) -= Ef f(t) fe } and T_(k) := 
E{p,(t)p; (t)}. Then, we see that 


lim Hin =H, lim Ty(k)=T,, lim T_(k) = T- 
k—-0o k—-0o k—-0o 


From Algorithm 1 of Subsection 7.4.2, it follows that 


SPY = Oe = lee Oe = Jim ORT, (k)Ox 


k—-0o 


Also, from Algorithm 2, 
3S Sy = lim, Ops lim, C, Pe “er 
k-o0o0 k-o00 


Let the Cholesky factorization of block Toeplitz matrices be T; (k) = LL} and 
T_(k) = M,M{. Then it follows that 


X( 2 Op) = A(CRTO (RCE OL TS" (k)Ox) 
= d((My Mj) Ay, (Lely) He,r) 

= ((Op "Hee Mp7)" (Ly! Hie My7)) = 0°(Ly' Hin My) 
where Hy, = O,C, and \(AB) = X(BA) except for zero eigenvalues are used. It 


follows from Lemma 8.2 that the singular values of Tie a keV, T are the canonical 
correlations between f;,(t) and p;,(t). Thus taking the limit, 


eye Jim AC Q,) = Jim ob Age) So (ba) 
—0o 0° 
where LE and © are respectively the Cholesky factors of the matrices T, and T_. 


Thus we see that the square root of the ith eigenvalue of 3’ Y equals the ith canonical 
correlation between the future f(t) and past p(t), as was to be proved. 


8.5 Balanced Stochastic Realization 


In this section, we consider a balanced stochastic realization based on the CCA. 
From the previous section, we see that in the balanced stochastic realization, the 
state covariance matrices of both forward and backward realizations are equal to the 
diagonal matrix ©’ = diag(o1, --- , on); see also Definition 3.9. 


8.5.1 Forward and Backward State Vectors 


We assume that rank(H) = n. Let the Cholesky factorization of block Toeplitz 
matrices T, and T_, defined by (8.19) and (8.20), be given by T. = LL" andT_ = 
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Mm’, respectively”. Then, as shown above, the canonical correlations between the 
future and past are given by the SVD of the normalized H, i.e., 


LHM-T=Usvt 


so that we have 
H=LusSVvtmMt (8.38) 


From the assumption that rank(H) = n, it follows that © = diag(o1, +--+ , on), 
1 > oy > os So > OandUT Y= 1,.V'V Hh. 
According to Lemma 8.2, we define two n-dimensional canonical vectors 


a(t) := V™M~'p(t), il) ee Ole Pam a8) 
Then, it can be shown that E{a(t)a™ (t)} = E{B(t)67 (t)} = In, and 


E{B(t)a" (t)} = diag(a1, ++: , On) 


Thus we see that (a1, --- , On) are canonical correlations between f(t) and p(t). 
It therefore follows that the orthogonal projection of the future f(t) onto the past 
Y, is expressed as 


P{ f(t) | Yr} = ELF Op" (O}(E{p()p" ®}) p(t) = ATI pt) 
= LUSV'M'™(MM")"'!p(t) = LUZ a(t) (8.39) 


Hence, we see that the canonical vector a(t) is the orthonormal basis of the forward 
predictor space X,; = E{Y} | Yy}. Similarly, the orthogonal projection of the past 
p(t) onto the future space Y;* is given by 


E{p(t) | Yi} = ATF) = MV ZB(t) (8.40) 


This implies that the canonical vector 3 (t) is the orthonormal basis of the backward 
predictor space X; = E{Y; | yf}. 
Let the extended observability and reachability matrices be defined by 


Oe LUE: Ce eyi mus (8.41) 
Then, from (8.38), the block Hankel matrix H has a decomposition 
HALO Ay it = 60 (8.42) 


where rank(O) = rank(@) =n. 
Let x(t) and x(t — 1) respectively be given by 


a(t) := 3)/%a(t) = CTT p(t) (8.43) 


Since H, T;, T_ are infinite dimensional, it should be noted that the manipulation of 
these matrices are rather formal. An operator theoretic treatment of infinite dimensional ma- 
trices is beyond the scope of this book; see Chapter 12 of [183]. 
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and 
a(t — 1) := D280) = OTT F() (8.44) 


The former is called a forward state vector, and the latter a backward state vector. By 
definition, we see that 


E{x(t)e' (t)} = © = Efay(t — 1)a¢ (t — 1)} (8.45) 
It follows from (8.41) and (8.43) that (8.39) is rewritten as 
E{ f(t) | Yr} = Ox(t) (8.46) 


This implies that the past data necessary for predicting the future f(t) is compressed 
as the forward state vector x(t). Similarly, from (8.44), we see that (8.40) is ex- 
pressed as : 

E{p(t) | Yo} = CT a(t — 1) (8.47) 


so that xy(t — 1) is the backward state vector that is needed to predict the past p(t) 
by means of the future data. 

In the next subsection, we show that a forward (backward) Markov model for the 
output vector y is derived by using the state vector x(t) (a, (¢ — 1)). From (8.45), 
it can be shown that the state covariance matrices of both forward and backward 
Markov models are equal to the canonical correlation matrix, so that these Markov 
models are called balanced stochastic realizations. 


8.5.2 Innovation Representations 


We derive innovation representations for a stationary process by means of the vectors 
a(t) and x,(t — 1) obtained by using the CCA, and show that these representations 
are balanced. 


Theorem 8.4. In terms of the state vector defined by (8.43), a forward innovation 
model for y is given by 


a(t+1) = Ax(t) + Ke(t) (8.48a) 
y(t) = Ca(t) + e(t) (8.48b) 

where e is the innovation process defined by 
e(t) = u(t) — Ely) | Ye} (8.49) 


which is a white noise with mean zero. Moreover, it can be shown that the matrices 
A, C, C', K, Rare given by 
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A=eret =otot 


See De aM OV Be ee (8.50) 
C= O(1:p,1:n)€R*" (8.51) 
CT = @(1:n,1:p) € R”*? (8.52) 
R= A(0)—CZCt € R*? (8.53) 
K=(C?=ADCR Ver? (8.54) 


where (-)* and (-)* denote the operations that remove the first block column and the 
first block row, respectively. Also, & is a stabilizing solution of the ARE 


BS AVAT AG? HARON. HCG) HG = ASCII. - 4855) 


and Ax = A— KC is stable. 
Proof. 1° Define w(t) as 


w(t) := x(t +1) — Ef{x(t +1) | e()} (8.56) 
By the definition of orthogonal projection, 
E{ax(t +1) | x(t)} = E{e(t + 127 (t)}(E{x(t)2™ (t)})*2(t) = Ax(t) 
where, from (8.43), 
A= CT E{p(t + 1)p? (t)}TT1eT 5 


Also, by the definition of p(t), it can be shown that 


From the decomposition T_ = M M7, we have T* = M(M™)*, so that 
A=€(MM")-1M(M")*(MM")'e's 
= 5¥2yT M1(MM")-!M(M1)“(MM")-!MV312 5-1 


Seyler Vs ASHES oto (8.57) 


The last equality in the above equation is obtained from OC = OTC of Theorem 
6.1 (iv). Moreover, we see from (8.38) that )~!/2UT L1H = 51/2VvT MT, so that 


yoiPytplAye = Seyi (My 
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Thus, we have (8.50) from (8.57). 
2° From (8.42) and (8.43), 


E{y(t) | Ye} = Ely@)p" O}(E{o@)p" OP") 
= Efy(é[y"@-1) y* (2) + ]}T-)" pe) 
=[A(1) A(2) ---](T_-)~'p(t) = H(1: p,:)(T-)“* p(t) 
= O(1: p,1: n)€(1: n,:)\(T_) p(t) 
= O(1:p,1:n)a(t) = Cx(t) 
Thus, we see that (8.48b) and (8.51) hold. Suppose that / > t¢. Since 
eQa=y)-Ca lo, x(t)edy CUP 


we have e(/) L x(t), 1 = ¢t,t+1,---, implying that e(t) is a white noise. Also, 
computing the covariance matrices of both sides of (8.48b) yields (8.53). 
3° From (8.43) and (8.56), we have 


w(t) = a(t +1) — Az(t) € Yay, w(t) L Yyp 


and Yi, = ¥; Sspan{e(t)}. Thus, w(t) belongs to span{e(t)}. This implies that 
w(t) can be expressed in terms of the innovation process e(t), so that 


w(t) = E{w(t) | e(t)} = E{w(te™ ()}R-e(t) = Ke(t) 


However, since w(t) L x(t), we get 
Ef{w(t)e" (t)} = E{w@)[y) — Cx()]"} = E{w(t)y" ())} 
Hence, from (8.43) and (8.56), 
E{w(t)e" (t)} = E{[CT= "p(t + 1) — Ax(t)]y™ (4)} 
= CT—'E{p(t + ly" (t)} — AE{a(t)[Cx(t) + e(t)]™} (8.58) 


From the definition of T_, the first term in the right-hand side of the above equation 
becomes 


A(0) a 
CTO E{p(t + yT()} = eTZ! | ATG) | =e | 0 


= @(:,1:p) =: 07 


Also, the second term of (8.58) is equal to ASC’, so that we have (8.54) and 
(8.52). Thus the state equation (8.48a) is derived. Moreover, computing the covari- 
ance matrices of both sides of (8.48a), we get the ARE (8.55). Finally, the stability 
of Ax = A — KC follows from Lemma 5.14; see also Theorem 5.4. 
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We see that the stochastic realization results of Theorem 8.4 provide a forward 
Markov model based on the canonical vector a(t). We can also derive a backward 
Markov model for the stationary process y in terms of the canonical vector §(t). 


Theorem 8.5. By means of the state vector defined by (8.44), we have the following 
backward Markov model 

zp(t — 1) = Ata, (t) + KT e,(t) (8.59a) 

y(t) = Cap(t) + eo(t) (8.59b) 


where the innovation process e» defined by 


est) = y(t) — Efy(t) | Yea} 


is a white noise with mean zero and covariance matrix R, where R and K™ are 
respectively given by aa 
R= A(0) —CZCT € Re*? (8.60) 


and iM ae 
iS (Clea SO Re (8.61) 
Moreover, the covariance matrix 3: for the backward model satisfies the ARE 


Te A SALIC LATO AG) SCC) Gs GHA) (8.62) 


and AT — KC is stable. 


Proof. We can prove the theorem by using the same technique used in the proof of 
Theorem 8.4, but here we derive the result from (8.44) by a direct calculation. From 
the definition of T. and (8.44), 


a(t — 1) = OTT," f(t) 


A(0) Cot 


aT aTeat t 
=[C Aro ot a. 


. ro (8.63) 


The inverse of the block matrix in (8.63) is given by 


A(0) Cot 
OCT Py 


V -vco'T;* 
-T,'0CTV T,'+T,0CTVCOTT,* 


where, from (8.45), 
V = (A0)— Cor, 0G") * =(A0) — C0") 
Thus, computing the right-hand side of (8.63) yields 
a(t — 1) = (CT — ATOTTT'OC™)Vy(t) + ATO'T f(t + D 
+ ATOTTTOCTVCO'T,' f(t+ 1) —CTVCO'Ty' f(t +1) 
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By using OTTO = Y and OTT" f(t +1) = a(t), 
xp(t — 1) = Alay(t) + (CT — ATZC™)V[y(t) — CT x, (t)] 


Define R and K™ as in (8.60) and (8.61), respectively. Then, we immediately obtain 
(8.59a). Also, we see from (8.47) that 


E{p(t+ 1) | Yi} = CT a(t) 


From the first p rows of the above expression, we get E{y(t) | Gis = Caw, 
so that we have (8.59b). By definition, e,(t) 1 Yj, ,, and hence e,(t) L ae for 


1=1, 2, ---. Also, since a (f+ 1) € Yi for] = 1, 2, ---, it follows that 


Efes(t)e, (t+ 0} = Efex@[yt +1) —Cax(t+)]"} =0 


holds for / = 1, 2, ---, implying that e, is a backward white noise. Thus, its covari- 
ance matrix is given by (8.60). Finally, computing the covariance matrices of both 
sides of (8.59a) yields (8.62). 


The state covariance matrices of two stochastic realizations given in Theorems 
8.4 and 8.5 are equal and are given by cov{z(t)} = © = cov{a(t — 1)}. Also, 
two AREs (8.55) and (8.62) have the common solution 2’, a diagonal matrix with 
canonical correlations as its diagonals. It follows from Algorithms | and 2 shown 
in Subsection 7.4.2 that 27 is the minimum solution of both AREs. Thus, two sys- 
tems defined by (8.48) and (8.59) are respectively the forward and backward Markov 
models for the stationary process y with the same state covariance matrix. In this 
sense, a pair of realizations (8.48) and (8.59) are called stochastically balanced. 


8.6 Reduced Stochastic Realization 


In Sections 4.8 and 7.2, we have introduced a backward Markov model as a dual of 
the forward Markov model. 

In this section, we first derive a forward Markov model corresponding to the 
backward model of (8.59). This gives a forward model for the stationary process y 
with the maximum state covariance matrix 5* = X7}, 


Lemma 8.5. Let x*(t) := ©~12,(t — 1). Let the state space model with x*(t) as 
the state vector be given by 


a*(t+1) = Ax*(t) + K*e*(t) (8.64a) 
y(t) = Ca*(t) + e*(t) (8.64b) 


Then, the above realization is a forward Markov model with cov{a*(t)} = X71, 
which satisfies the ARE 
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Ss A tA (CO? =A Ct) 
SAO) SCE Oly eer sy A) (8.65) 
Also, the covariance matrix of e* and the gain matrix K* are respectively given by 
B=A0=Cs*C" (8.66) 


and 


S(t). = As Oar (8.67) 


Proof. A proof is deferred in Appendix of Section 8.11. 


We see that (8.65) is the same as (8.55), and Y’ and Y'—! are respectively the 
minimum and maximum solution of the ARE. Since the elements of the diagonal 
matrix %’ are the canonical correlations between the future and the past, they lie 
in the interval [0, 1]. By assumption, we have o,, > 0, so that if 0, < 1, then 
X-! — ¥ > 0 holds. It therefore follows from Theorem 7.5 (ii), (iii) that 


Ax :=A-KC 


is stable, implying that the inverse system of (8.48) is stable. It is also shown in 
[69, 107] that, under the assumption of A(0) > 0, the condition that o; < 1 implies 
that B(w) > 0, -tT<w<n. 

We now consider a reduction of a Markov model constructed in Theorem 8.4. We 


partition the covariance matrix of the state vector as ©; = diag(oi, --- , o,) and 
Sp = diag(or41, +++ , On). Accordingly, we define 
ge |e ae s CSIC Gl. iC =|6- 79 (8.68) 
Az Ag? 


Also, we consider the transfer matrix defined by (7.11) 
Z(z) = C(zI— A)“1C7 + 54(0) (8.69) 
and its reduced model 
Z,(z) = C1 (2p — Ars) CT + 540) (8.70) 


The following lemma gives conditions such that the reduced model Z,.(z) becomes 
(strictly) positive real. 


Lemma 8.6. Suppose that (A, C, C™) is minimal and balanced. Then, if Z(z) is 
(strictly) positive real, so is Z,(z). Moreover, if 0, > 0,41, the reduced model Z,(z) 
is minimal?. 

Proof. (i) Suppose first that 7(z) is positive real. From (8.55), 


3Tn general, this is not a balanced model. 
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XY—AZXAT CT-—AZCT 


M(Z)=| 7 
=| e@_osat A(0) — CCT 


= fal R[KT EZ] >0 (8.71) 
Pp 


By using the partitions in (8.68), 17(2’) is expressed as 


—AoiS1 Al, — AseS2Aty Lg — Aoi E1 AQ, — Ane D2 Agg Cy — Aoi X1CP — Ao2Z2Cy 


; — AnZi Aq, — Ai22Aq, —Ai121 Ag, — Ai2Z2Agn CP —An21cf7 - sae? 
Cy — OD AT, — Coh2At, C2— Cit Ag, — CoZ2Azy AO) — C1210 — CoZ2CP 


Deleting the second block row and column from the above matrix gives 


Sy An ZA, _ Ay. 2 AL GF = Aus1Ct = Ayo. 29CP 50 &® 
OL. OAT SO EAR: AO)=C:3,61 SOc | 
In terms of (Ai1, C1, Ci, A(0)), we define 
W—-Ay,WAT, Ci — Ay uct 
M,(II) := eo ke ede, | eer 


C= CMTAl AO) =O. Ct 


Then, it follows from (x) that 


my(S1) > | G2] Sal4b OF 
Since ©) > 0, we have M,(2,) > 0 with Y; > 0. 

We show that Aj, is stable. Since (8.71) gives a full rank decomposition of 
M(), we see from Theorem 7.1 that (A, K’) is reachable. By replacing B by 
KR?/? in the proof of Lemma 3.7 (i), it can be shown that A, is stable. Since, 
as shown above, M/,(2,) > 0 with »', > 0, this implies that Z,(z) is a positive real 
matrix; see the comment following (7.26) in Subsection 7.3.1. 

(ii) Suppose that 7(z) is strictly positive real. Since (A, C, C) is minimal, it 
follows from Theorem 7.5 (ii), (iii) that 3-! — Y > O and R = A(0) —CYC™ > 0. 
Since 3; is a submatrix of X’, we see that ee — 5 > Oand A(0)—C, 5, CT > 0. 
As already shown in (i), we have M1(2,) > 0, and M, ory > 0 since both ’ and 
&—! satisfy the same ARE. In other words, there exist two positive definite solutions 
a ' and S| satisfying the LMI and 


Fay So 


It therefore follows from Lemma 7.8 that Z,.(z) becomes strictly positive real. 
(iii) Suppose that o,. > 0,41 holds. Then, it follows from Lemma 3.7 (ii) that 
Z,(Z) is a minimal realization. 


Thus we have shown that the reduced model 7,(z) is (strictly) positive real 
and minimal, but not balanced. It should be noted that the minimal solution J7,, of 
M, (IT) > 0 satisfies the ARE 
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Ty @ Agi AL 4(C? = Ants) 
x (A(0) — CLIC Y)'(Cy = An TL.) (8.72) 
so that the gain matrix is expressed as 
Ki. = (Cf — Au I.CT)(A) — C1: .C7)“! (8.73) 
Then the reduced order Markov model of (8.48) is given by 
a(t +1) = Ayai(t) + Ky,é(t) (8.74a) 
y(t) = Ca (t) + é(t) (8.74b) 
where cov{é(t)} = A(0) — Ci ,C}, and Ay, — K14C; is stable. 


Corollary 8.1. The reduced order Markov model of (8.74) is stable, and inversely 
stable. 


It should be noted that 1, of (8.73) is different from the gain matrix 
Ky = (6? = Ay Ac? = Apel (AO) =C)34C) =Cys,0) 7)" 


which is the first block element of K obtained from (8.71). Moreover, the reduced 
Markov model with (411, Ci, Ci, K1) is not necessarily of minimal phase, since 
Ay, — K,C, may not be stable. 

A remaining issue is, therefore, that if there exists a model reduction procedure 
that keeps positivity and balancedness simultaneously. The answer to this question 
is affirmative. In fact, according to Lemma 3.8, we can define 


A, = Ai, + A12(al — Ag)! Any (8.75a) 
C, = C1 + Co(al — Ag2)~' Ant (8.75b) 
C= C72 Cal SAL) OAL (8.75c) 
A,(0) = A(0) + C2(aI — Ag2)~1CF + C2(al — Ad,)~!c7F (8.75d) 


where |a| = 1. Then, we have the following lemma. 


Lemma 8.7. Suppose that Z(z) = (A, C, C,4A(0)) is strictly positive real, min- 
imal and balanced. If 0, > Op+41 holds, then Z,(z) = (Ay, Cr, Cr, +A,(0)) of 
(8.75) is strictly positive real, minimal and balanced. 

Proof. See [106, 108]. It should be noted that the expression of A,(0) is different 
from that of Lemma 3.8 in order to keep it symmetric. 


So far we have considered the stochastic realization problem under the assump- 
tion that an infinite time series data is available. In the next section, we shall derive 
algorithms of identifying state space models based on given finite observed data. 
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8.7 Stochastic Realization Algorithms 


Let a finite collection of data be given by {y(t), t= 0, 1, --- ,N + 2k — 2}, where 
k > O and N is sufficiently large. We assume that the given data is a finite sample 
from a stationary process. We define the block Toeplitz matrix* 


y(kK—1) y(k) +++ y(N+k-2) 
Yo = y(k — 2) ae 1) ++: y(N - — 3) < RiPxN 
yO) y(t) ss y(N - 1) 
and the block Hankel matrix 
y(k)  y(k+1)--- y(k+N—1) 
Vee vk +) ae gi Ue) e Reexn 


y(2k —1) y(2k) ++» y(N + 2k —2) 


where k > n, and the number of columns of block matrices is N. 
Let k be the present time. As before, we write Y, = Yoja—1 and Yr = Yejon—1, 
respectively. The sample covariance matrices of the given data are defined by 


Til Meh ete acti = Ih Se Sate 
Be yryTj- pp “Pp 
N | Be Yr] “tp Uff 


Also, consider the LQ decomposition of the form 


a Ly, 0 Qi 
oe = 8.76 
TN FA ee Lx | [Qt ee) 
Then, it follows that 
Sfp = In Lh, Dp = La Ly, + Lobo, Sop = Ly Lt 


We see that the above sample covariance matrices ¥’,, X'¢7, X’pp are finite dimen- 
sional approximations to the infinite matrices H, T and T_ of (8.18), (8.19) and 
(8.20), respectively. 

The following numerical algorithm is based on the theory of balanced stochastic 
realization of Theorem 8.4. 


Stochastic Balanced Realization — Algorithm A 


Step 1: Compute square root matrices L and M of the covariance matrices 5’, 
and 7, such that 
Yeah, By =MM" (8.77) 


“It should be noted that although Yo|,—1 is a Hankel matrix, Yo) p—1 1S a Toeplitz matrix. 
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Step 2: Compute the SVD of the normalized covariance matrix Sr, such that 
EO SMS SUey Us (8.78) 


where ¥ is given by neglecting sufficiently small singular values of »’, and hence 
the dimension of the state vector is given by n = dim &. 


Step 3: Define the extended observability and reachability matrices as 
Op.= 105", e, = SV2vT mt (8.79) 
Step A4: Compute the estimates of A, C, CT as 
A=O0'0;,. C >On ip,), CT =Cly Tsp) (8.80) 


where ©, = Ox(1: (k —1)p,:) and O, = Ox(p +1: kp,:). 
Step AS: Let A(O) = Xp (1: p,1: p). Then, the Kalman gain is given by 


K=(CT =ASC U0) =Czrc'y (8.81) 


Step A6: By the formula in Theorem 8.4, we have an innovation representation 
of the form 


a(t +1) = Ax(t) + Ke(t) 
y(t) = Cax(t) + e(t) 


where, from (8.53), the covariance matrix of the innovation process is given by R = 
A(0) — CC. 


Remark 8.1. Since Algorithm A is based on the stochastic balanced realization of 
Theorem 8.4, we observe that this algorithm is quite different from Algorithm 2 of 
Van Overschee and De Moor ( [165], p. 87). In fact, in the latter algorithm, based on 
the obtained (A, C, C, A(0)), it is necessary to solve the ARE of (7.84) to derive 
the Kalman gain K from (7.85). However, as stated in Chapter 7, there may be a 
possibility that the estimate (A, C, C, A(0)) obtained above is not positive real, 
and hence the ARE of (7.84) does not have a stabilizing solution. In Algorithm A, 
however, we exploit the fact that » obtained by the CCA is an approximate solution 
of the ARE of (8.55), so that we always get an innovation model. 


We present an alternative algorithm that utilizes the estimate of state vector. The 
algorithm is the same as Algorithm A until Step 3. 


Stochastic Balanced Realization — Algorithm B 


Step B4: Compute the estimate of the state vector 
= S12VT MY 1 ERX 


and define the matrices with NV — 1 columns as 
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Keg =Xe,2:N), Xe=Xe,1:N—-D, Yow =Yer(,1:N—-D 
Step B5: Compute the estimate of (A, C’) by applying the least-squares method 
Als Pw 
=|e]%+[2] 


where py € R"*(%—)) and p, € R°*(—") are residuals. 


to ri 
Xp 
Yale 


Step B6: Compute the sample covariance matrices of residuals 


E A ee (8.82) 


STR] N-1 


POP a, PUP. 
PvPw PoPo 


Then, we solve the ARE associated with the Kalman filter [see (5.67)] 

P = APA™ — (APC™ + §)(CPC™ + R)-1(APCT +S5)7+Q (8.83) 
to get a stabilizing solution P > 0. Thus the Kalman gain is given by 
K =(APC™ + 8)\(CPCT + R)7! 
Step B7: The identified innovation model is given by 
&(t + 1) = A&(t) + Ké(t) 
y(t) = C&(t) + e(t) 
where var{é(t)} =CPC™ + R. 


Remark 8.2. In Algorithm B, the covariance matrix obtained by (8.82) is always 
nonnegative definite, so that the stabilizing solution P > 0 of the ARE of (8.83) ex- 
ists. Thus we can always derive an approximate balanced innovation model because 
cov{X,} =. 

In Algorithm 3 of Van Overschee and De Moor ( [165], p. 90), however, one must 
solve the Lyapunov equation 


Ee =AS*ATH+O 
under the assumption that A obtained in Step BS is stable. By using the solution 
Xs > 0, the matrices C and A(0) are then computed by 
C=C 8 ADS REOHC! 


to get the data (A, C, C, A(0)). The rest of Algorithm 3 is to solve the ARE of (7.84) 
to obtain the Kalman gain as stated in Remark 8.1. It should be noted that Algorithm 
3 works under the assumption that the estimated A is stable; otherwise it does not 
provide any estimates. Hence, Algorithm B derived here is somewhat different from 
Algorithm 3 [165]. 
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8.8 Numerical Results 


We show some simulation results using simple models; the first one is a 2nd-order 
ARMA model, and the second one a 3rd-order state space model. 


Example 8.1. Consider the ARMA model described by 
y(t) — 1.5y(t — 1) + 0.7y(¢ — 2) = e(t) — 0.5e(t — 1) + 0.3e(t — 2) 


where e is a zero mean white Gaussian noise with unit variance. We have generated 
time series data under zero initial conditions. By using Algorithm A with k = 10, 
we have identified innovation models, from which transfer functions are computed. 
Table 8.2 shows canonical correlations ¢;, 2 = 1, --+ , 6 between the future and past 
vs. the number of data N, where N = oo means that the exact canonical correlations 
are computed by using the relation o; = \/;(3)¥)) derived in Theorem 8.3. We 
observe from Table 8.2 that, though the values of the first two canonical correlations 
0; and o2 do not change very much, other canonical correlations 03, 04, +: get 
smaller as the number of data N increases. For smaller NV < 1000, we find that o3 
and o4 are rather large, so that it is not easy to estimate the order of the ARMA 
model. However, as the number of data increases, the difference between oy and 03 
becomes larger, so that we can correctly estimate the ordern = 2. 


Table 8.2. Canonical correlations between the future and past 


N O1 02 03 O4 05 06 
500 0.9047 0.4916 0.2328 0.2302 0.1687 0.0865 
1000 0.9095 0.5028 0.1685 0.1638 0.1322 0.0776 
2000 0.9189 0.5121 0.0997 0.0781 0.0488 0.0415 
5000 0.9170 0.5108 0.0760 0.0404 0.0392 0.0311 
10000 0.9165 0.5087 0.0468 0.0297 0.0253 0.0224 
20000 0.9137 0.5137 0.0342 0.0288 0.0217 0.0129 
50000 0.9130 0.5070 0.0142 0.0122 0.0116 0.0064 

co )=—: 0.9133 (0.5036 =0.0000 0.0000 0.0000 0.0000 


Now we consider the case where the number of data is fixed as N = 10000. If 
we take k = 80, then the first six canonical correlations are given by 


¥ = diag(0.9171, 0.5144, 0.1403, 0.1380, 0.1313, 0.1304) 


Compared with Y’ = diag(0.9165, 0.5087, 0.0468, 0.0297, 0.0253, 0.0224) in 
Table 8.2 (N = 10000), we see that though the first two canonical correlations oj 
and o2 are not significantly affected, the values of a3, 04, --- are quite changed 
by taking a large value of /. This may be caused by the following reason; for a 
fixed N, the sample cross-covariance matrix 5, (or covariance matrices 2’p ¢ and 
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Table 8.3. Parameter estimation by Algorithm A 


a1 a2 C1 C2 

N15) @7) (0.5) (0.3) 

500 —1.3621 0.6147 —0.3842 0.4376 
1000 —1.4146 0.6487 —0.4301 0.3795 
2000 —1.4723 0.6742 —0.4418 0.3182 
5000 —1.5109 0.7077 —0.4791 0.2865 
10000 —1.5107 0.7035 —0.4900 0.2899 
20000 —1.5152 0.7096 —0.5074 0.2859 
50000 —1.4991 0.6989 —0.4940 0.3026 


pp) computed from (8.76) will loose the block Hankel (Toeplitz) property of true 
covariance matrices as the number of block rows k increases. 

Table 8.3 displays the estimated parameters of the 2nd-order ARMA model. In 
the identification problem of Example 6.7 where both the input and output data are 
available, we have obtained very good estimation results based on small number 
of data, say, N = 100. However, as we can see from Table 8.3, we need a large 
number of data for the identification of time series model where only the output data 
is available. This is also true when we use the stochastic realization algorithm given 
in Lemma 7.9, because we need accurate covariance data to get good estimates for 
Markov models. Although not included here, quite similar results are obtained by 
using Algorithm B, which is based on the estimate of state vectors. 


Example 8.2. We show some simulation results for the 3rd-order state space model 
used in [165], which is given by 


0.606 0 0.17 
a(t+1)=|-0.6 0.6 0| x(t) + | —0.15 | e(t) 
0 004 0.28 


y(t) = [0.78 0.53 1.0]a(t) + e(t) 


where e is a Gaussian white noise with mean zero and unit variance. As in Example 
8.1, we have used Algorithm A to compute canonical correlations and estimates of 
transfer functions, where k = 10. The simulation results are shown in Tables 8.4 
and 8.5. We observe that, except that the estimate of the parameter c3 is rather poor, 
the simulation results for the 3rd-order system are similar to those of the 2nd-order 
system treated in Example 8.1. Also, we see that as the increase of number of data 
N, the canonical correlations are getting closer to the true values. 


It should be noted that the above results depend heavily on the simulation con- 
ditions, so that they are to be understood as “examples.” Also, there are possibilities 
that the stochastic subspace methods developed in the literature may fail; detailed 
analyses of stochastic subspace methods are found in [38,58, 154]. 
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Table 8.4. Canonical correlations between the future and past 


N O1 02 03 O04 05 06 
2000 0.4042 0.2063 0.1299 0.0873 0.0805 0.0542 
5000 0.4038 0.2010 0.0970 0.0739 0.0432 0.0339 
10000 0.3899 0.2063 0.1095 0.0483 0.0299 0.0293 
20000 0.3801 0.2181 0.1043 0.0340 0.0267 0.0214 
50000 0.3840 0.2216 0.1060 0.0139 0.0114 0.0112 

oe) 0.3820 0.2244 0.1030 0.0000 0.0000 0.0000 


Table 8.5. Parameter estimation by Algorithm A 


ay a2 a3 C1 c2 C3 

N (-1.6) (1.2) (0.288) (1.2669) (0.6866) (—0.024) © 
2000  —1.5809 1.1360 —0.2214 —1.2068 0.5735 0.0748 
5000  —1.4937 1.0825 —0.2228 -—1.1214 0.5476 0.0461 
10000 —1.5641 1.1693 -—0.2611 —1.2081 0.6414 0.0190 
20000  —1.6151 1.2161 -—0.2901 —1.2812 0.7050 —0.0269 
50000 —1.6046 1.2082 —0.2825 —-—1.2655 0.6890 —0.0120 


8.9 Notes and References 


This chapter has re-considered the stochastic realization problem based on the 
canonical correlation analysis (CCA) due to Akaike [2,3]. Also, we have derived 
forward and backward innovation representations of a stationary process, and 
discussed a stochastic balanced realization problem, including a model reduction 
of stochastic systems. 

In Section 8.1 we have reviewed the basic idea of the CCA based on [14, 136]. 
The stochastic realization problem is restated in Section 8.2. The development 
of Section 8.3 is based on the pioneering works of Akaike [2-4]. In Section 8.4, 
we have discussed canonical correlations between the future and the past of a 
stationary process. We have shown that they are determined by the square roots 
of eigenvalues of the product of two state covariance matrices of the forward and 
the backward innovation models; see Table 4.1 and [39]. 


Section 8.5 is devoted to balanced stochastic realizations based on Desai et al. 
[42,43], Aoki [15], and Lindquist and Picci [106, 107]. By extending the results of 
[106, 107], we have also developed stochastic subspace identification algorithms 
based on the LQ decomposition in Hilbert space of time series [151,152], and 
stochastic balanced realizations on a finite interval [153, 154]. 

Section 8.6 has considered reduced stochastic realizations based on [43, 106]. An 
earlier result on model reduction is due to [50], and a survey on model reduction 
is given in [18]. The relation between the CCA and phase matching problems has 
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been discueed in [64,77]. Some applications to economic time series analysis are 
found in [16]. 


e In Section 8.7, the stochastic subspace identification algorithms are derived based 
on the balanced realization theorem (Theorem 8.4); see also the algorithm in 
[112]. Section 8.8 includes some numerical results, showing that a fairly large 
number of data is needed to obtain good estimates of time series models. More- 
over, Appendix includes a proof of Lemma 8.5. 


8.10 Problems 


8.1 Show that the result of Lemma 8.1 is compactly written as 
De? Oe || dee eg | SO ln Pe, 
Oe | Bae on | Oe ie 


det | pr 7 | (Pap enh) 


8.2 Let Y be a Hilbert space. Let B = span{b} be a subspace of Y. Show that the 
orthogonal projection of a € Y onto B is equivalent to finding Kk such that 
||a — Kb|| is minimized with respect to K,, and that the optimal K is given by 


and we have 


K = Ef{ab"}(Ef{bo"})—! 


8.3. Compute the covariance matrices of the output process y for three realizations 
given in Theorems 8.4, 8.5 and Lemma 8.5, and show that these are all given by 
(7.7). 

8.4 [43] Suppose that y is scalar in Theorem 8.4, and that canonical correla- 


tions {o;, 1 = 1,--- , n} are different. Prove that there exists a matrix S = 
diag(+1, --- , +1) such that 


A=SA'S, C=CS 


8.5 In Subsection 8.5.1, consider the following two factorizations 7, = Ly if = 


LoLd and T. = M,M} = M2M-. Then, the SVD of the normalized block 
Hankel matrix gives 


H = 1,UW,5V,;' Mf = LeU. ZV,' Mt 


Suppose that the canonical correlations {o;, i = dy wets n} are different. Let 
the two realizations of y be given by (A,;, C;, Cj, K;, Rj), 7 = 1, 2. Show 
that there exists a matrix S = diag(+1, --- , +1) such that 


A» = SAS, C2 =C,S, Cs = 01S, Ko = SK, Ro = Rk, 
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8.11 Appendix: Proof of Lemma 8.5 
1° From (8.59a), the covariance matrix of x, is given by 


SAL 1=0,1,-:- 


8.84 
ANAS” Peale a) 


Ef{ay(t + l)ap (t)} = 


Define w* (t) := 2*(t+ 1) — Ax*(t). Note that ,(t —1) € EXYT | Yt} Cc YF and 
ro(t) € E{Y,, | Yiat Cc be CY}. Then, since x*(t) = ©—1z,(t — 1), we get 


w* (t) = Dota, (t) — AD" a(t — 1) € YF 
and hence for] = 0,1,---, 
w*(t+l)= Ua (t +) —-ADtae(t+1-1) Ee y¥h, cut (8.85) 
Also, from (8.84), 
E{w* (t + 1)(a*(t))"} 
= E{a*(t +1+1)(2*(t))™} — AE{a*(t + 1)(a* (t))"} 
= 5 Ef{os(t + Da} (t-—1)}27" 
~ AX E{a,(t+1—Vaf(t-—D}r7! 
ayy A ye Ay a SSG 10, Ios 


Since x*(t) = Y—lap(t — 1), it follows that 
E{w* (t+ Da} (t—1)} =0, f=), 2 


implying that w*(t + 1) L span{a,(t — 1)} = E{Y; | Yt}. This together with 
(8.85) show that w*(t + 1) L Y;. Hence, the following relation holds. 


w(t +0) Le*(é), yt-1), ¢=0,1,+: (8.86) 
2° Define e*(t) := y(t) — Ca*(t). Since x*(t) = 5-12, (t — 1) € Yi, we have 
e*(t) € Y}. Also, from (8.59), 
Efy(t)(a*(t))"} = E{[Cas(t) + eo()]ag (¢- I} 2 
= CE{ay(t)ag (t — 1)}07* + Efes(t)ae (t — 1} D+ 
=CUAS "| + E{e,(t)[a7 @)A + ef (KZ 
=CUAS | + Efes(thes (t)}K 571 
SCEAS (Ce CcrAs acs 
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Hence, we have 
E{e*(t)(a*(t))"} = E{y(t)(a*(t)"} — CE{a*(H)(a*@)*} 
4 OP Joc CP Saas —al 


This implies that e*(t) 1 «*(t). Thus, for any t, we have e*(t) L E{Y7 | Yj} and 
e*(t) € Yj}. Hence, similarly to the proof in 1°, we get 


U+OLYy, > C+) LY, 1=0,1,--: 


By definition, e*(t+1) € Yi, C Yj, it follows that e*(t+1) 1 a*(t),1=0, 1, ++ 
Thus, summarizing above, the following relation holds. 


e(t+1) La*(t), yt—1), 1=0,1,-:- (8.87) 
3° It follows from (8.86) and (8.87) that for h = 1, 2, ---, 


E{w*(t + h)(w*(t))"} = E{w* (t+ h)[2*(t + 1) — Ax*(t)]"} = 0 
E{w*(t + h)(e*(t))"} = E{w*(t + h)[y(t) — Ca*()]"} = 0 
E{e*(t + h)(e*(t))" } = Efe*(t + h)[y(t) — Ca*()]"} = 0 
Efe*(t + h)(w*(t))"} = Efe*(t + A)[a* (t + 1) — Ax*(t)]"} = 0 


This implies that the joint process (w*(t), e*(t)) is white noise. 
Now we compute the covariance matrices of w*(t) and e*(t). We see that the 
covariance matrix of w*(t) is given by 


Q* = E{w*(t)(w*(t))"} 
= E{{x*(t +1) — Ax*(t)][x*(t + 1) — Ax*(t)]™} 
=o aA tat 
Noting that 2*(t) L e*(t), we have 
S* = E{w* (t)(e*(@))"} = Ef{[2*(¢+ 1) — Ae*@](e* () 7} 

= E{a* (t+ I[yt) — C2*()]*} 

= 57! Ef x5 (t)[Cap(t) + es(t) —CD71a,(t — 1)]"} 

= DB {a,(t)aft (t)}CT — O71 B{a,(t)afd (t — 1)}271ct 

Sal yGt = yAy iC? So 45 Ct 


Also, the covariance matrix of e*(t) is given by 


R* = Efe*(t)(e*(t))"} = E{[y() — Ca*@][y@ - Ca* (]*} 
SAO = Oy oC 
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It therefore follows that 


[syn] = 


4° Finally, by using the ARE of (8.62), i.e., 


“t- ASAT CT- AZ “ICT 


(OT — AD-1CT)T A(0) —CEO-1CT noe 


Ys AT PAL(CT UH ATDO (AO) C267) (C7 = ATECT)T | 8.89) 
we can derive, as shown below, the ARE of (8.65): 
SES ASAT B10! & AS IG Ap Ce HC!) 
OC AN) (8.90) 


This equation implies that the block matrix of (8.88) is degenerate, so that there exists 
a linear relation between w*(t) and e*(t). Hence, we have 


w"(t) = B{w*(t) |e} 
= E{w*(t)(e*())*}(Efe*(e*()" pte) 
= S*(R*)“"e*(t) = K*e*(t) 
This completes a proof of Lemma 7.5. 


Derivation of Equation (8.90) As shown in Section 5.8 (see Problem 5.7), the 
ARE of (8.89) is expressed as 


3 = FT OF + FT SCT (AO) — CLC™) CSF +CTA1(0)C 
where F := A — CT A~1(0)C. Using the matrix inversion lemma of (5.10) yields 
5 —CT™A1O0)C = FTE + SCT(A(O) — COC) OL F 
= F271 - CTA (0)C]1F (8.91) 


Again, using the matrix inversion lemma, the inverse of the left-hand side of (8.91) 
becomes 


[Pac AS Oe) Sor 4 OMA) =Cs Co )y-os 


Suppose that F' is invertible. Then, by computing the inverse of the right-hand side 
of (8.91), and rearranging the terms, 


FS Py te +e OA) = OF OT) es te 4 Cha OC 


This is equivalent to (8.90). 
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Subspace Identification 


9 
Subspace Identification (1) - ORT 


This chapter deals with the stochastic realization problem for a stationary stochastic 
process with exogenous inputs. As preliminaries, we review projections in a Hilbert 
space, and explain the feedback-free conditions and the PE condition to be satisfied 
by input signals. The output process is then decomposed into the deterministic and 
stochastic components; the former is obtained by the orthogonal projection of the 
output process onto the Hilbert space spanned by the exogenous inputs, while the 
latter is obtained by the complementary projection. By a geometric procedure, we 
develop a minimal state space model of the output process with a very natural block 
structure, in which the plant and the noise model are independently parametrized. 
Subspace algorithms are then derived based on this convenient model structure. 
Some numerical results are included to show the applicability of the present algo- 
rithm. 


9.1 Projections 


We briefly review projections in a Hilbert space and present some related facts that 
will be used in the following. Let 2 € IR” be a random vector. Let the second-order 
moment of z be defined by 


B{llel?} = Do B{x?} 


where £{-} denotes the mathematical expectation. Let a set of random vectors with 
finite second-order moments be defined by 


+ = B{z | Ef|lal?} < co} 


Then the mean-square norm of z € H is given by ||2||3¢ = \/ F{||x||?}. It is well 
known that H is a linear space, and by completing the linear space with this norm, 
we have a Hilbert space generated by random vectors with finite second moments. 
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Let a, b be elements of H, and let A, B be subspaces of KH. If Ef{ab"} = 0, we 
say that a and b are orthogonal. Also, if E{ab™} = 0 holds for alla € A andb € 8, 
then we say that A and 8B are orthogonal, and we write A 1 3B. Moreover, A V B 
denotes the vector sum such that {a+ b|a€ A, b € B}, A+ B denotes the direct 
sum (A 1B = {0}), and A © B the orthogonal sum (A L B). The symbol A+ 
denotes the orthogonal complement of the subspace A in H, and span{a, b} denotes 
the Hilbert space generated by all the linear combinations of random vectors a and 
b. If infinite random vectors are involved, we write Span{a,, a2, --- }. 

Let A and 8B be subspaces of +. Then, the orthogonal projection of a € A onto 
B is denoted by {a | B}. If B = span{b}, the orthogonal projection is written as 


E{a|B} = E{ab™}E{bb" }*b 


where (-)' denotes the pseudo-inverse. The orthogonal projection onto the orthogo- 
nal complement B~ is denoted by E{a | B+} = a—E{a | B}. Also, the orthogonal 
projection of the space A onto B is denoted by E{A | B}. 


Lemma 9.1. Let B, © C H, and suppose thata € BV C and BN C = {0} hold. 
Then, we have the decomposition formula 


E{a| BV C} = Eyef{a| B} + Eya{a | C} (9.1) 


where E\je{a | B} is the oblique projection of a onto B along ©, and E\a{a | e} 
the oblique projection of a onto © along B as in Figure 9.1. 


E\efa | B} 


Figure 9.1. Oblique projections 


We write the oblique projection of A onto B along C as E\e{A | BL.IfB LE, 
then the oblique projection reduces to the orthogonal projection onto B. 


Definition 9.1. Suppose that a € A and b € B satisfy the orthogonality condition 
E{(a— Ef{a| @})(b- E{b|e})™}=0, CcH (9.2) 


Then, we say that a and b are conditionally orthogonal with respect to ©. If (9.2) 
holds for alla € A and b € 8, then we say that A and B are conditionally orthogo- 
nal with respect to ©. This is simply denoted by A 1 B | €. 
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Figure 9.2. Conditional orthogonality (a € ZV C,bE XV C) 


In Figure 9.2, let the orthogonal projections of a and b onto Y = € be denoted by 
c, and cy, respectively. Then, the condition (9.2) implies that a — c, and b — cy are 
orthogonal. 


Lemma 9.2. The conditional orthogonality A 1 B | € is equivalent to the following 
condition 


E{B | AV C} = E{B | ©} (9.3) 


Proof. The conditional orthogonality implies that (9.2) holds for alla € A, b € B. 
Since (b— E{b| C}) 1 C and E{a | C} € €, it follows from (9.2) that 


E{a(b — E{b | C})"} = E{(a + c)(b— E{b | C})"} =0 (9.4) 


holds for alla € A,b € Bandc € €. Since AVC = {a+c| ae A, c€ Ch, we 
see from (9.4) that B — E{B | C} LAV C. Hence, 


E{B | AV C} = E{E{B | C} | AV eC} (9.5) 


However, since C C AV €, the right-hand side equals E{B | C}. Conversely, if (9.3) 
holds, then we have (9.5), so that B — E{B | C} L AV ©. This implies that (9.4) 
holds for all a, b, c, and hence (9.2) holds. 


9.2 Stochastic Realization with Exogenous Inputs 


Consider a discrete-time stochastic system shown in Figure 9.3, where u € IR” is 
the input vector, y € R? the output vector, € € IR’ the noise vector. We assume that 
u and y are second-order stationary random processes with mean zero and that they 


Figure 9.3. Stochastic system with exogenous input 
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are available for identification. The output y is also excited by the external noise €, 
which is not directly accessible. We need the following assumption, whose meaning 
is explained in the next section. 


Assumption 9.1. There is no feedback from the output y to the input u. This is called 
a feedback-free condition. 


The stochastic realization problem with exogenous inputs is stated as follows. 


Stochastic Realization Problem 


Suppose that infinite data {u(t), y(t), t = 0, £1, .--} are given. The problem is to 
define a suitable state vector x with minimal dimension and to derive a state space 
model with the input vector u and the output vector y of the form 


a(t+1) = Ax(t) + Bu(t) + Ke(t) (9.6a) 
y(t) = Ca(t) + Du(t) + e(t) (9.6b) 


where e is the innovation process defined below (see Lemma 9.6). It should be noted 
that the stochastic realization problems considered in Chapters 7 and 8 are realiza- 
tions of stationary processes without exogenous inputs. 


y 


Consider the joint input-output process w = |] € R¢, where d := p+™m. 


Since we are given infinite input-output data, the exact covariance matrices of w are 
given by 


Aww(l) = E{w(t + Dw™()} = bes Arad 1 -, epee 


and the spectral density matrix is 


= = @,,,(z) ® A 
boolz)= > dww(de' = [3at3 oe 
G)= Qe AwwlDe" = | gii(e) Saale) 
We consider the prediction problem of w(t + k), k = 1, 2,--- based on the 
present and past observations w(t), w(t — 1),--+ such that 


ve = mn COV { +k)- S fiw(t — | 


; i=0 


where f; € IR?*4 are coefficients. In general, for the minimum prediction error 
covariance matrices, we have 0 < X31 < My <---. If 3 > O, then the process 
w is regular. If det ©, = 0, w is called singular [138]; see also Section 4.5. If the 
spectral density matrix ©,,,,,(z) has full rank, we simply say that w has full rank. The 
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regularity and full rank conditions imply that the joint process w does not degenerate. 


Let the Hilbert space generated by w be defined by 
W = span{w(r) | 7 =0,+1,---} 


This space contains all linear combinations of the history of the process w. Also, we 
define Hilbert subspaces generated by u and y as 


U = span{u(r) | 7 = 0,+1,---} 
9 = Span{y(7) | 7 =0,+1,---} 


Let t be the present time, and define subspaces generated by the past and future of wu 
and y as 


U; = span{u(r)|7<t},  Y;, =spanf{y(r) |r <t} 
uy =span{u(r) |r >t}, YF =span{y(r) |r > 4} 


It may be noted that the present time ¢ is included in the future and not in the past by 
convention. 


9.3 Feedback-Free Processes 


There exists a quite long history of studies on the feedback between two observed 
processes [24—26, 53, 63]. In this section, we provide the definition of feedback-free 
and consider some equivalent conditions for it. 

Suppose that the joint process is regular and of full rank. It therefore follows from 
the Wold decomposition theorem (Theorem 4.3) that the joint process w is expressed 
as a moving average representation 


Mol= Le a] [tena] 97 


where A; € R?*?, B; € R°*™, C; € R™*?, D; € R™*™ are constant matrices, 
and yv € R? and7 € IR” are zero mean white noise vectors with covariance matrices 


By Fa [v"(s) TO} =Qbrr, Q>0 


Define d x d matrices I; := & D; 


| ,2=0, 1, --- and the d x d transfer matrix 
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Theorems 4.3 and 4.4 assert that {1,4 = 0,1,---} are square summable and 
I’~1(z) is analytic in |z| > 1. In the following, we further assume that both I'(z) 
and ’~'(z) are stable, ie. I'(z) is of minimal phase. Also, we assume that I"(z) 
is a rational matrix [24, 25], so that we consider only a class of finitely generated 
stationary processes, which is a subclass of regular full rank stationary processes. 
The condition of rationality of '(z) is, however, relaxed in [26]. 

Now we provide the definition that there is no feedback from the output y to 
the input uw, and introduce some equivalent feedback-free conditions. The following 
definition is called the strong feedback-free property [26]; however for simplicity, 
we call it the feedback-free property. 


Definition 9.2. There is no feedback from y to u for the joint process of (9.7), if the 
following conditions are satisfied. 


(i) The covariance matrix Q is block diagonal with 


Q = Bb 2, | ’ Qi: € REP Qe S i ioe 


(ii) The moving average representation (9.7) is expressed as 
y(t) fe = A; B; v(t = i) 
bal 7 ys | 0 D;| [nt-4) oo 


A(z) Bz) 
0 D(z) 


so that the transfer matrix I'(z) = | | is block upper triangular. 


Theorem 9.1. Suppose that the joint process w is regular and of full rank. Then, the 
following conditions (i) ~ (v) are equivalent. 


(i) There is no feedback from the output vector y to the input vector u. 


(ii) The smoothed estimate of y(t) based on the whole input data is causal, i.e. 
Ef{y(t) [Up = B{y) |Ugay, t= 0, £1, + (0.9) 


(iii) In terms of the input u, the output process y is expressed as 


y(t) => Kiu(t-) +0 Liv(t-a) (9.10) 
i=0 1=0 


where 


K(z)= s Kiz—*, L(z) = s ia 
i=0 i=0 


are rational matrices such that L(z) has full rank, and K(z), L(z) L~\(z) are 
stable, and the processes u and v are uncorrelated, where v € R® is a zero mean 
white noise vector. 
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(iv) Given the past of u, the future of u is uncorrelated with the past of y, so that the 
conditional orthogonality condition 


irate iy, Page t=0,+1,-:: (9.11) 
holds. This condition is equivalent to 
Blut | vy vu;p} = {uy | Up}, t=0,41,--- (9.12) 


which implies that the past of y is irrelevant for the prediction of the future of u 
given the past of u. This condition is due to Granger [63]. 


Proof. A proof is deferred in Appendix of Subsection 9.10.1. 


9.4 Orthogonal Decomposition of Output Process 


9.4.1 Orthogonal Decomposition 


Suppose that Assumption 9.1 holds. Putting A = Ue B= Yi, and = Uj, 
we get AV C = U. It then follows from Lemma 9.2 and (9.11) that 


Ens (US Oza Ua). t= 0,iss (9.13) 
Since y(t) € Yj, 1, we have 
E{y(t) |Up=F{y@) |Ugi}, t=0,41,--- (9.14) 


This is the same as the condition (ii) of Theorem 9.1. 
We now define the orthogonal decomposition of the output process y. 


Definition 9.3. Consider the orthogonal projection of y(t) onto U such that 
vat) = E{y(t) | UW = E{y() | Unit (9.15) 


Then, ya is called the deterministic component of y. Also, the complementary pro- 
jection 


ys(t) = y(t) — E{y(t) | Una} 
= y(t) — B{y(t) | UY = Efy() | Ut} (9.16) 


is called the stochastic component of y. 


The deterministic component yg is the orthogonal projection of the output y onto 
the Hilbert space U spanned by the input process w, so that yq is the part of y that is 
linearly related to the input process wu. On the other hand, the stochastic component 
ys is the part of y that is orthogonal to the data space U; see Figure 9.4. Thus, though 
Ys is causal, it is orthogonal to the whole input space. 
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0 U 
Ud 


Figure 9.4. Orthogonal decomposition 


Lemma 9.3. Under Assumption 9.1, the output process y is decomposed into the 
deterministic component ya and the stochastic component yg. In fact, 


y(t) = ya(t) + ys(t), t=0,+1,::: (9.17) 


where y(t) L ya(r) holds for allt, 7 =0,+1,---. 
Proof. Immediate from (9.15) and (9.16). 


From this lemma, we see that if there is no feedback from the output to the input, 
a state space model for y is expressed as the orthogonal sum of state space models 
for the deterministic component yq and the stochastic component y,. It follows from 
Theorem 9.1 (iii) that yg and y, correspond to the first- and the second-term of the 
right-hand side of (9.10), respectively. 


9.4.2 PE Condition 


In this subsection, we consider the PE condition to be satisfied by the input process, 
which is one of the important conditions in system identification [109, 145]; see also 
Appendix B. 


Assumption 9.2. For each t, the input space U has the direct sum decomposition 


U=U; + Us (9.18) 


where U; M Uf = {0}. 


The condition Up N U? = {0} is equivalent to the fact that the spectral density 
function of wu is positive definite on the unit circle [106], i.e., 


Pin(w) > clm, dc>0 (9.19) 


In this case, the input u has PE condition with order infinity. 

The condition (9.19) is equivalent to the fact that all the canonical angles between 
the past and future spaces of the input are positive. It also follows from [64, 69] that 
a necessary and sufficient condition that the canonical angles between Uj’ and U; is 
zero is that ,,,,(z) has some zeros on the unit circle. 
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Remark 9.1. The above assumption is too restrictive for many practical cases, and 
we could instead assume the PE condition of sufficiently high order and the finite 
dimensionality of the underlying “true” system. The reason for choosing the above 
condition is that it allows very simple proofs in the mathematical statements below, 
and it does not require the finite dimensionality assumption on the “true” systems. 


Lemma 9.4. Under Assumptions 9.1 and 9.2, the deterministic component ya of 
(9.15) is expressed as 


ya(t) = >) Giu(t - 4) as Gili (9.20) 
i=0 ee 


where G; € R?*™ are constant matrices with G(z) = 07°) Giz’ stable. 
Proof. This fact is essentially shown in the proof of Theorem 9.1 (iii). From (9.15), 
we have ya(t) € Uj; 1, so that it is expressed as a linear combination of the present 
and past inputs as in (9.20). 

The stability of G(z) can also be proved as follows. The optimality condition for 
{G;, i= 0, 1, ---} is given by 


HO GMa laa g) GSO Api 
i=-0 
Thus it follows that 
Ap.Q=S ChaeG Se JeUap es 
i=-0 


Since the above equation is a discrete-time Wiener-Hopf equation, we can solve it by 
using the spectral factorization technique. Suppose that the spectral density matrix 
®,1,,(z) is factored as 


Puu(z) = O(2z)O* (27") 


where O(z) is of minimal phase. It then follows from [11] that the optimal transfer 
matrix G(z) is given by 


G(z) = [Gyu(z)O7 1 (271)]4.071 (2) (9.21) 


where | - |, denotes the operation that extracts the causal part of the transfer matrices. 
Thus the stability of G(z) follows from the definition of (9.21). 


Lemma 9.5. The deterministic component ya and the stochastic component y, of 
(9.15) and (9.16) are mutually uncorrelated second-order stationary processes, and 
are regular and of full rank. 


Proof. Lemma 9.3 shows that two components are uncorrelated. We see from (9.20) 
that yq = G(z)u. However, since u is second-order stationary and since G(z) is 
stable, yq is a second-order stationary process. Thus y, := y — ya is also second- 
order stationary. Moreover it may be noted that y and wu are regular and of full rank, 
so are ya and y,. 
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Remark 9.2. A finite sum approximation of yg of (9.20) is given by 


k-1 


ya(t) = > Giu(t - 4) 


i=0 


for a sufficiently large & > 0. It can be shown that this is easily computed by means 
of LQ decomposition (see Section A.2). 


9.5 State Space Realizations 


In order to obtain a realization of the stochastic component y,, we can employ the 
results of Chapter 8. For the deterministic component yg, however, the mathematical 
development of a state space realization is slightly involved, since we must employ 
oblique projections due to the existence of the exogenous input wu. 

We begin with a realization of the stochastic component. 


9.5.1 Realization of Stochastic Component 
Let the Hilbert space generated by y, be defined by 
Y =span{y,(r) | 7 =0,41,---} CU" 
and let Hilbert subspaces generated by the past and future of y, be defined by 
Ye =Span{ys(r)|7<t}, YF =spamf{ys(r) | 7 > t} 


It follows from the stochastic realization results in Chapter 8 that a necessary and 
sufficient condition that the stochastic component has a finite dimensional realization 
is that the predictor space 


ee See yet (9.22) 
is finite dimensional. 


Theorem 9.2. Define dim(X} / ~) = fi. Then, the minimal dimension of realization 
is n. In terms of a basis vector x, of the predictor space, a state space realization of 
the stochastic component yg is given by 


as(t +1) = Aga, (t) + Kee,(t) (9.23a) 
ys(t) = Cya5(t) + es(t) (9.23b) 


where e, is the innovation process for ys, or the one-step prediction error defined by 


es(t) = ys(t) — Efys(t) | 97} 


Proof. Immediate from Theorem 8.4 in Subsection 8.5.2. 
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The innovation representation of (9.23) is called the stochastic subsystem. The 
following lemma shows that the innovation process e, is the same as the conditional 
innovation process e of y. 


Lemma 9.6. The innovation process e, is expressed as 
ea(t) = e(t) = y(t) — Efy(t) | Una V9} (9.24) 
Proof. By using the property of orthogonal projection, we can show that 
e(t) = y(t) — Ely) | Uns V9} = 9 — Ely) | Un © 97} 
= (y(t) — E{y@) | Ura }) - Bly) | 953 
= ys(t) — E{ys(t) + alt) | Ye} 
= ys(t) — B{ys(t) | Br} = es(t) 


where we used the fact that y(t) 1 Y7. 


9.5.2 Realization of Deterministic Component 


A state space realization of the deterministic component should be a state space 
model with the input process u and the output process yg. In the following, we use 
the idea of N4SID described in Section 6.6 to construct a state space realization for 
the deterministic component of the output process. 

As usual, let the Hilbert space generated by yg be defined by 


A 


Y =span{ya(r) | 7 =0,41,---} CU 


and let Hilbert spaces spanned by the future and past of the deterministic component 
ya be defined by 


Ye = Span{ya(r) |r >t}, Ye = Span{ya(r) | 7 < t} 
Definition 9.4. For any t, if a subspace 8; (C U, ) satisfies the relation 


Bug (HF Ul} = Eyre (9F | 80} (9.25) 


then 8, is called an oblique splitting subspace for the pair (yt stl) 


The space 8; satisfying (9.25) carries the information contained in U;, that is 
needed to predict the future outputs ya(t + 1), 1 = 0, 1, ---, so that it is a candidate 
of the state space. Also, the oblique predictor space for the deterministic component 


XS Beeld (Gy (9.26) 


is obviously contained in U; and is oblique splitting. This can be proved similarly 


to the proof of Lemma 8.4. In fact, since a Ge U;, , it follows from the property 
of projection that 
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f 7 - +/- _ +/— | at/- 
Bug (OF | Ur} = 90 = By (OG |} 


SH eee ey ae 


= Bis (87 | XE} 


This shows that eee /~ ig an oblique splitting from (9.25). Thus all the information in 


Sie that is related to the future is contained in the predictor space ees bee 
We further define the extended space by 


oe Soe (9.27) 
Then, we have the following basic result. 


Lemma 9.7. The predictor space ea defined by (9.26) is an oblique splitting 
subspace for (Yj, Uz), and 


CH!" = ttn Up (9.28) 


holds. Moreover, under Assumption 9.2, we have the following direct sum decompo- 
sition 


Ot = (Gin Up) + Gin UP) (9.29) 
Proof. A proof is deferred to Appendix of Subsection 9.10.2. 


Now we assume that dim(X}* / ~) = n holds. Let the subspace generated by u(t) 
be defined by 
Us := span{u(t)} Cc Uf 
It follows from Assumption 9.2 that U,, = Up + Ur (Up M Ur = {0}). Hence, 
we have a direct sum decomposition 


Se : = Ye 11 Ua, = (WA 19 WG ) + Gis nN Us) (9.30) 


where US CY} holds, so that 


(WAY nN up) Cc (Yi NuUp)= ae 
Since Che M Ub) C Uz holds, we see from (9.30) that 
ee (9.31) 


It should be noted here that the right-hand side of the above equation is a direct sum, 
since ce A Ur = {0} holds from ioe cu;. 

Let zq(t) € R” bea basis vector for X;° /~ and xa(t+1) be the shifted version of 
it. We see from (9.28) that xq(t + 1) is a basis for the space ae = oF 19 Usjy- 
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err 
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Figure 9.5. Direct sum decomposition of xa(t + 1) 


As in Figure 9.5, the projection of xa(t + 1) € oe onto the subspaces in the 


right-hand side of (9.31) gives the state space equation 
salt + 1) = Aata(t) + Bau(t) (9.32) 


where Ag € R"*” and By € R"*™. Note that, since the right-hand side of (9.31) is 
a direct sum, (9.32) is a unique direct sum decomposition. 
Since ya(t) € Yj N Uj, 1, it follows from (9.27) that 


ya(t) € UF N Un, = Gin Uy) + Gin Ww) c sa + U 


Hence, the projection of yq(t) onto the two subspaces in the right-hand side of the 
above equation yields a unique output equation! 


ya(t) = Caxa(t) + Dau(t) (9.33) 


where Cy € R?*” and Dg € R?*™ are constant matrices. 
Since the predictor space X/'/~ is included in U7, we see that xa(t) € Uz. As 
in Lemma 9.4, it follows that xq(t) is expressed as 


foe) 


va(t)= > Fu(t-i) => 2a=F(z)u 


i=l 
where F(z) is stable and F; € R"*™. Since u is regular and stationary, so is aq. 
Thus it follows from (9.32) that 

gq = (zl — Aq) Bau = F(z)u => F(z) =(zI—Aa)7'Ba 


Since F(z) is stable, if (Ag, Ba) is reachable, all the eigenvalues of Ag must be 
inside the unit disk, so that Ag is stable. 
Summarizing above results, we have the following theorem. 


Theorem 9.3. Suppose that the joint process w has a rational spectral density matrix 
and that the input process u satisfies Assumptions 9.1 and 9.2. Then, the predictor 


'The decomposition of ya(t) is obtained by replacing xa(t + 1) > ya(t), Aata(t) 9 
Caza(t), Bara(t) + Dau(t) in Figure 9.5. 
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space a has dimension n, and a state space model with the state vector xa(t) € 
ee is given by 
va(t +1) = Agara(t) + Bau(t) (9.34a) 
ya(t) = Caxa(t) + Dau(t) (9.34b) 


where Ag is stable. This is called the deterministic subsystem. Moreover, let X;, be 
the state space of another realization of ya, and let dim(X;) = i. Then we have 
n>n. 
Proof. The first half of the theorem is obvious from above. We shall prove the 
result for the dimension of the state spaces. Let 7 € R” be a state vector, and let a 
realization for yq be given by 

z(t + 1) = Az(t) + Bu(t) 

ya(t) = Cz(t) + Du(t) 


The impulse response matrices of the system are defined by 
D, t=0 
W=4)--.,- 
CAt1B, #=1,2,--: 


The following proof is related to that of the second half of Lemma 9.7; see Sub- 
section 9.10.2. In terms of impulse responses, we have 


t—1 t+k 
ya(t +k) = S- Witn—su(t) + se Wer e—iu(t) 
i=—0co ist 
=y7(t+k) +yf(t+h), k=0,1,-:- (9.35) 


where y7 (t +k) € U; andy} (t+k) € Uf. Since U; M UP = {0}, we see that 
yz (t + k) is the oblique projection of ya(t + k) onto Uz along U;’, so that 


yg (t+ k) = Ey {ya(t +k) | Ur} (9.36) 


and hence {y; (+k) | & = 0, 1,---} generates c= [see (9.26)]. Similarly, 
y} (t + k) is the oblique projection of ya(t + k) onto U; along U;, and also by 
definition, {y} (t+ k) | k =0, 1, --- } generates Yj". It therefore follows that 


yq (t + k) € (y? n uy) Cc Ch n us) 


By combining this with (9.36), we get 


yalt +k) =yzp(t +k) +yf +k) € YF nN U,) + n Uys a} 


This implies that the two terms in the right-hand side of (9.35) are respectively the 
oblique projections of ya(t + &) onto the two subspaces of (9.29). Moreover, from 
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(9.35), we can write yz (t+k) = CA*z(t), k = 0, 1, ---, so that the space ee is 
generated by {C'A*z(t) | k = 0, 1, --- }. However, the latter is contained in X, := 
span{zZ(t)}. Thus it follows that ae Cc X;, implying that dim(xq) < dim(Z). 


This theorem shows that the state space X; of a realization of yq includes the 
state space x / ~, so that aes /~ isa state space with minimal dimension. 


9.5.3 The Joint Model 


As mentioned above, to obtain a state space realization for y, it suffices to combine 
the two realizations for the deterministic and stochastic components. From Theorems 
9.2 and 9.3, we have the basic result of this chapter. 


Theorem 9.4. A state space realization of y is given by 


Rae = a ral i HM = i] u(t) + lies e(t) (9.37a) 


y(t) = [Ca Cs] Be | + Dau(t) + e(t) (9.37b) 


Proof. A proof is immediate from Theorems 9.2 and 9.3 by using the fact that 
e, = e (Lemma 9.6). 


We see from (9.37) that the state-space model for the output process y has a very 
natural block structure. The realization (9.37) is a particular form of the state space 
model of (9.6) in that A- and B-matrix have block structure, so that the state vector 
tq of the deterministic subsystem is not reachable from the innovation process e, 
while the state vector x, of the stochastic subsystem is not reachable from the input 
vector u. Also, if the deterministic and stochastic subsystems have some common 
dynamics, i.e., if Ag and A, have some common eigenvalues, then the system of 
(9.37) is not observable, so that it is not minimal. 


Figure 9.6. Multivariable transfer function model 


Thus, as shown in Figure 9.6, the output y is described by 


y = P(z)ut+ A(zje (9.38) 


254 9 Subspace Identification (1) - ORT 
where P(z) and H(z) are respectively defined by 
P(z) = Da t+ CalzIn - Aa) 'Ba 


and 
H(z) =1,+C,(zI, — As) Kg 


It should be noted that by the orthogonal decomposition method, we have a multivari- 
able input-output model where the plant transfer matrix P(z) and the noise transfer 
matrix H(z) have independent parametrizations. 

Up to now, we have considered the ideal case where an infinite input-output data 
is available. In the following sections, we derive subspace identification methods by 
adapting the present realization results to given finite input-output data. 


9.6 Realization Based on Finite Data 


In practice, we observe a finite collection of input-output data. In this section, we 
consider the realization based on finite data. Suppose that we have a finite input- 


output data {u(t), y(t), t= 0, 1,--- , T}. Let the linear spaces generated by wu and 
y be denoted by 

Upo,7] = span{u(t) |t=0,1,--- , T} 

Yio,r] = span{y(t) |f= 0, 1,--- , T} 


Also define the orthogonal complement of Ujo,7; on the joint space Ujo,7] V Yjo,7); 
which is denoted by 29,7). Therefore, we have 


Upo,7] © 20,7] = Upo,r] V 90,7) 
Lemma 9.8. For the deterministic subsystem of (9.34), we define 
§a(t) = Efya(t) | Uo,ry} = E{y@) | Upo,ry} 
Then the projected output §a(t) is described by the state space model 
fa(t +1) = Aakalt 
Galt) = Caka(t 
é4(0) = E{xa(0) | Ur} (9.39c) 


+ Bau(t) (9.39a) 
+ Dau(t) (9.39b) 


where q(t) := E{xa(t) | Upo,r]}- It should be noted that the above equation is the 
same as (9.34), but the initial condition @q(0) is different from xa(0) as shown in 
(9.39c). 


Proof. Since UD Ujo,7}, we have (9.39) by projecting (9.34) onto Uo, 7}. 
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Lemma 9.8 shows that we can identify the system matrices (Aq, Ba, Ca, Da) 
based on the data {u(t), ga(t), t = 0, 1,--- , 77} by using the subspace method 
described in the next section. Moreover, from the identified matrices and (9.39), we 
have 


t-1 
Galt) = CaAyéa(0) + ¥> CaAt | Bau(i) + Dau(t) (9.40) 

i=0 
fort = 0,1,---, 7. Since the estimates of (Ag, Ba,Ca, Da) are given, we can 


obtain the estimate of the initial state vector %4(0) of (9.39c) by applying the least- 
squares method to (9.40). 

We turn our attention to realization of the stochastic subsystem. In this case, we 
need to compute the orthogonal projection of the output data onto the orthogonal 
complement 2,97), which is written as 


g.(t) = y(t) — E{y(t)| Upon}, t=0,1,--°,T7 (9.41) 


The next lemma clarifies the relation between the stochastic components y, and 7, 
defined above. 


Lemma 9.9. Define the estimation error 
Ya(t) = ya(t) — galt), t=0, 1, aes) T 


Then we have 


Proof. We first note that 
y(t) =ys(t) +ya(t), = - vst) LU 
Since U D Up,7}, it follows from (9.41) that 
Gs(t) = ys(t) + ya(t) — E{ys(t) + ya(t) | Uor}} 


= ys(t) + (yalt) — E{ya(t) | Uo.r}}) = yolt) + Galt) 


as was to be proved, where we used the fact that F{y.(t) | Up,r}} = 0. 


Lemma 9.9 means that for a finite data case, the output §,(t) projected onto 
the complement 2j9,7] is different from the true stochastic component defined by 
ys(t) = E{y(t) | U+}, because the former is perturbed by the smoothed error g(t) 
of the deterministic component. 

Define the vector Zq(t) := xa(t) — £a(t). It then follows from (9.34) and (9.39) 
that 

fa(t +1) = Aata(t), Ya(t) = Caka(t) 


so that the term acting on the stochastic component y,(t) is expressed as 
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galt) =CaAyza0), t= 0,1,---,T (9.43) 


Thus the estimation error #q(0) does influence on the stochastic component as well 
as the deterministic component. If we ignore the additive term a(t) in (9.42), there 
are possibilities that we may identify stochastic subsystems with higher dimensions 
than the true order n. Hence it is desirable to filtering out this additive terms by some 
means. For more detail, see [30, 131]. 


9.7 Subspace Identification Method — ORT Method 


In this section, we develop subspace methods for identifying state space models 
based on finite input-output data. In the sequel, the subspace method is called the 
ORT method, since the identification methods developed in this section are based on 
the orthogonal decomposition technique introduced in Section 9.4. 

Suppose that the input-output data {u(t), y(t), t= 0, 1,---, N+ 2k — 2} are 
given with N sufficiently large and k > n. Based on the input-output data, we define 
as usual block Hankel matrices as 


u(0) u(1)--- u(N—-1) 
Fi Or OY Neate 
hai cae SOS 
and 
u(k) u(k+1)--- u(k+N-—1) 
u(k+1) u(k+2)---  u(k+N) 
Ugj2k—-1 = ' : eran 
u(Qk—1) u(Qk) +++ u(N + 2k —2) 
Similarly, we define Yojz—1, Yzjon—1 € R'exN | and also 
Upj2r—1 *= | ; Yojok—1 °= Ee 


9.7.1 Subspace Identification of Deterministic Subsystem 


Consider the subspace identification of the deterministic subsystem of (9.39). We 
define the extended observability matrix as 


Ca 


CaAa 


O, = € Rex", k>n 


CyAe 
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and the block lower triangular Toeplitz matrix as 


Da 
CaBa Da 
WD, (Da, Ba) = CaAaBa CaBa Da cE Repxkm 
CaAB? Ba +++ «++ CaBa Da 


By iteratively using (9.39a) and (9.39b), we obtain a matrix input-output equation of 
the form [see (6.23)] 


Vion = OX + WU pon—-1 (9.44) 
where Veo ,_1 18 the block Hankel matrix generated by #jq and xX d is defined by 


X¢ =[@a(k) @a(k+1) «+» @a(k +N —1)] 


We need the following assumption for applying the subspace method to the sys- 
tem of (9.44); see Section 6.3. 


Assumption 9.3. Al) rank(X¢) = n. 
A2) rank(Uo)24-1) = 2km. 
A3) span(X¢) A span(Ugjox—-1) = {0}. 


As mentioned in Section 6.3, Assumption Al) implies that the state vectors 
fa(k), Za(k + 1), --- are sufficiently excited. The condition A2) implies that the 
input process u has the PE condition of order 24, and A3) is guaranteed by the 
feedback-free condition. 

First we present a method of computing the deterministic component of the out- 
put process. According to the idea of the MOESP method of Section 6.5, we consider 
the LQ decomposition: 


Uojor—-1| _ | Ri 0 4 
ae 7 i Bal E (9.45) 


where Ry, € R2*™*2k™) Roy € R?*P*<?2kP are block lower triangular matrices, and 
Qi € RN*?2km, Qo € RY*?*P are orthogonal matrices. It follows from A2) of 
Assumption 9.3 that R11 is nonsingular, so that we have 


Yojor—1 = RaiQt + Ro2Qs = Roi RU! Voje—1 + R2Qs 


We see that Ro, OF belongs to the row space of Up|2k-15 and R22 OF is orthogonal 
to it since Q}Q2 = 0. Hence, R21Q} is the orthogonal projection of Yoj2k—1 Onto 
the rowspace Uoj2,—1- Thus, the deterministic component is given by 


VS = Rn Qt = Rak Upj2r-1 (9.46) 


and hence the stochastic component is 
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v7 8 _ Od = AT 
Yoj2n—1 *= Yol2k—1 — Yojor—1 = R22Q2 (9.47) 


Bearing the above facts in mind, we consider a related LQ decomposition 


Upjor—1 Iy, 0 0 0 1 
Uojr—1 Lo; L292 0 0 7 

= 9.48 
Yojr—1 L31 L32 L33 0 - a8) 
Ypjok—1 Lay Laz Lag Lag | | QT 


where [11, Lo. € R’™**™ , D33, Lag € RP**P are block lower triangular matrices, 
and Q1, Qo € RY**™ Qs, Qa € Ire, are orthogonal matrices. In view of (9.46) 
and (9.48), the deterministic component Yio 4-1 1S given by 
Yoel = In1Q} + LaQ? 
On the other hand, from (9.44) and (9.48), we have 
Wp LriQt + OX ~ = La Qt + LaQo (9.49) 


Post-multiplying (9.49) by Qs yields O,.X@Q2 = Lay. Since XQ» has full row 
rank from A1) of Assumption 9.3, we have 


Similarly, pre-multiplying (9.49) by a matrix (Oz)? satisfying (Of)'O, = 0, and 
post-multiplying by Q, yield 
(0; )* La = (07, )' Ye (Ba, Dab (9.51) 


Making use of (9.50) and (9.51), we can derive a subspace method of identifying 
the deterministic subsystem. In the following, we assume that LQ decomposition of 
(9.48) is given. 

Subspace Identification of Deterministic Subsystem — ORT Method 

Step 1: Compute the SVD of L4: 


ae C 7 T i ee 
te So F 4 | ~Usvt (9.52) 


where S is obtained by neglecting smaller singular values, so the dimension of the 


B 


state vector equals dim(.S). Thus, the decomposition 
bav? = (69) (91207) 
gives the extended observability matrix O, = US?/?. 


Step 2: Compute the estimates of Aq and C'g by 


A, = 01,09, “Cz=Og(lsp;>) (9.53) 


9.7 Subspace Identification Method - ORT Method 259 


where ©; denotes the matrix obtained by deleting the first p rows from Ox. 


Step 3: Given the estimates of Ag and Cg, the Toeplitz matrix %, becomes linear 
with respect to By and Dg. By using U™ of (9.52) for (07-)", it follows from (9.51) 
that 

U' La Ly = UT (Ba, Da) (9.54) 
Then we can obtain the least-squares estimates of By and Dg by rewriting the above 
equation as (6.44) in the MOESP algorithm. 


Remark 9.3. In [171] (Theorems 2 and 4), the LQ decomposition of (9.48) is used 
to develop the PO-MOESP algorithm. More precisely, the following relations 


Im(Ox) = Im [Lap E43] 


and 
U*[L31 L32 Lai] =O, (Ba, Da)[Lo1 Lo Li] 


are employed. We see that these relations are different from (9.50) and (9.51); this is 
a point where the ORT method is different from the PO-MOESP method. 


9.7.2, Subspace Identification of Stochastic Subsystem 


We derive a subspace identification algorithm for the stochastic subsystem. For data 
matrices, we use the same notation as in Subsection 9.7.1. As shown in (9.47), the 
stochastic component is given by 


os = od 
Yol2e—1 > Yoj2e-1 = Yoj2e—1 


It follows from (9.48) that 


ors _ | £33 0 3 
O|2k—-1 — be Al EB (9.55) 
so that we define 
Yojr—1 = L33Q3 Vejoe—1 = LasQ3 + LasQt 
The sample covariance matrices of stochastic components are then given by 
Lipp pf 1 Yolk—1 . T 1 =| 
=—/~ Yorr— Vpjor— 
Ee Diff N [Ypjor—1 (Yoj—a)” (Yetae—a) 


Thus we have 


1 1 1 
Ztp = Hy esslas: LH=H (LasLj3 + Lali), Xpp = Hy bashes 


The following algorithms are based on the stochastic subspace identification al- 
gorithms derived in Section 8.7. 
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Subspace Identification of Stochastic Subsystem - ORT Method 
Algorithm A 


Step 1: Compute square roots of covariance matrices Sr r and 3, satisfying 
Sep DG “235.5 MMs 
Step 2: Compute the SVD of the normalized covariance matrix 
LSM" SUV ULV 


where 5) is obtained by deleting sufficiently small singular values of ¥’, so that the 
dimension of the state vector becomes 7, = dim »’. 


Step 3: Compute the extended observability and reachability matrices by 
Gaius. pes 
Step A4: Compute the estimates of A,, C, and CG; by 
Ag =O. On C, = Ox(1: p, :), Gr = @,(:,1:p) 
where 0, = Ox(p +1: kp,:). 


Step AS: The covariance matrix of y, is given by A,(0) := Xp ¢(1: p,1: p). By 


using (A,, C;, Cs, As(0)), the Kalman gain is given by 
Ky = (Cf —A,SCZ)(As(0) — C2208) 
so that the innovation model becomes 
r,(t+1) = A,x,(t) + Kee(t) 
ys(t) = C,a.(t) + e(t) 


where var{e(t)} = A,(0) —C,2C7. 


Now we present an algorithm that gives a state space model satisfying the posi- 
tivity condition, where Steps 1-3 are the same as those of Algorithm A. 


Algorithm B 
Step B4: Compute the estimates of state vectors by 


Xy = SYPVTM Vy 1 € RON 
and define matrices with N — 1 columns 
Keg = Xn(,2:N), Xe =XeG1:N-1), Vg_=Vgg(l: N-D 


Step B5: Compute the estimates of matrices A, and C’, applying the least- 
squares method to the overdetermined equations 


9.8 Numerical Example 
Ag) Pp 
_ Ss XxX Ww 
altel 


Step B6: Compute the covariance matrices of residuals 


E *] =, ee ae 


Xp 
Ss 
Vile 


where p,, and p, are residuals. 


STR] N=1| ppl popt 
and solve the ARE [see (5.67)] 
P= A,PA™ —(A,PCT + $)\(C,PCT + R)-1(A,PC? + §)7 +O 
In terms of the stabilizing solution P > 0, we have the Kalman gain 
K, =(A,PCT + §)(C,PCT +R)“ 
Step B7: The innovation model is then given by 
&(t+1) = A,&(t) + Ksé(t) 
y(t) = C,&(t) + é(t) 


where var{é(t)} = C,PCT + R. 
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The above algorithm is basically the same as Algorithm B of stochastic balanced 
realization developed in Section 8.7. We see that since the covariance matrices of 
residuals obtained in Step B6 is always nonnegative definite and since (C’,, A.) is 
observable, the ARE has a unique stabilizing solution, from which we can compute 
the Kalman gain. Thus the present algorithm ensures the positivity condition of the 


stochastic subsystem; see also Remark 8.2. 


9.8 Numerical Example 


Some numerical results obtained by using the ORT and PO-MOESP methods are 
presented. A simulation model used is depicted in Figure 9.7, where the plant is a 


5th-order model [175] 


Figure 9.7. Simulation model 
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0.02752—4 + 0.05512~° 


P(z) = ———___ Oe 
(2) 1 — 2.34432-! + 3.081z—-? — 2.52742-3 + 1.24152z—-4 — 0.36862-5 


and where the input signal generation model C'(z) and the noise model H(z) are 
respectively given by 


V1—a? aks 0.227! — 0.4827? 


= H = 
Cf) ie Cara ves ua ree 


~ L=az- 


The noises e and e,, are mutually uncorrelated white noises with N(0,07) and 
N(0, 1), respectively. Thus the spectral density functions of u and v are given by 


(Ww) =|C(e)/?, Bw) = 0 |A(e*)/? 


so that their powers are proportional to the gain plots of C'(z) with a = 0.9 and H(z) 
shown in Figure 9.8, respectively. 


Gain [dB] 


0 0.5 1S 


1 2 25. 3 
Frequency [rad/sec] 


Figure 9.8. The Bode gain plots of P(z), C(z) and H(z) 


In the present simulations, we consider the four cases according as u and/or uv 
are white noises or colored noises, where the variance a” of noise e is adjusted so 
that the variance of v becomes approximately 7? = 0.01. Also, the variance of the 
output yq = P(z)u changes according to the spectrum of u, so that the S/N ratio in 
the output becomes as 


Cee 10.73 white noise (a = 0) 
04/0; = 
2 51.43 colored noise (a = 0.9) 


In each simulation, we take the number of data points N = 1000 and the number 
of block rows k = 15. We generated 30 data sets by using pseudo-random num- 
bers. In order to compare simulation results, we have used the same pseudo-random 
numbers in each case. 

Case 1: We show the simulation results for the case where the input signal u is a 
white noise. Figure 9.9 displays the Bode gain plots of the plant obtained by applying 
the ORT method, where v is a white noise in Figure 9.9(a), but is a colored noise in 
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Gain [dB] 
Gain [dB] 


0 0.5 1 15 2 2.5 3 0) 0.5 1 15 2 25 3 
Frequency [rad/sec] Frequency [rad/sec] 


(a) (u = white, v = white) (b) (w = white, v = colored) 


Figure 9.9. Identification results by ORT method 


Gain [dB] 
Gain [dB] 


0 0.5 1 15 2 2.5 a 0 0.5 1 15 2 
Frequency [rad/sec] Frequency [rad/sec] 


(a) (u = white, v = white) (b) (w = white, v = colored) 


2.5 3 


Figure 9.10. Identification results by PO-MOESP method 


Figure 9.9(b), and where the real line denotes the Bode gain of the true plant. Also, 
Figure 9.10 displays the corresponding results by the PO-MOESP method. We see 
that there is no clear difference in the results of Figures 9.9(a) and 9.10(a). But, there 
are some differences in the results of Figures 9.9(b) and 9.10(b), in that the ORT 
method gives a slightly better result in the high frequency range and we observe a 
small bias in the low frequency range between 0.5 to 1 (rad) in the estimates by the 
PO-MOESP method. 

Case 2: We consider the case where wu is a colored noise. Figure 9.11 displays 
the results obtained by applying the ORT, and Figure 9.12 those obtained by using 
the PO-MOESP. Since the power of the input wu decreases in the high frequency range 
as shown in Figure 9.8, the accuracy of the estimate degrades in the high frequency 
range. 

Though the output S/N ratio for colored noise input is higher than for white noise 
input, the accuracy of identification is inferior to the white noise case. There are no 
clear difference in Figure 9.11(a) and 9.12(a) for white observation noise, but if v is 
a colored noise, there exist some appreciable differences in the results of the ORT 
and PO-MOESP methods in the low frequency range as well as in high frequency 
range as shown in Figures 9.11(b) and 9.12(b). 
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(a) (uw =colored, v = white) (b) (w = colored, v = colored) 


Figure 9.11. Identification results by ORT method 
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(a) (u = colored, v = white) (b) (w = colored, v = colored) 
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Figure 9.12. Identification results by PO-MOESP method 


A reason why there is a considerable difference in the results of the ORT and 
PO-MOESP in the case of colored observation noise is that the noise v has a large 
power in the frequency range greater than 1.5 (rad) where the input u has a low 
power. In fact, the peak of the gain characteristic shown in Figure 9.12(b) is located 
around 1.8 (rad), which is the same as the peak of noise spectrum of Figure 9.8. In 
the PO-MOESP method, the frequency components beyond 1.5 (rad) are erroneously 
identified as those due to the input signal u rather than the noise v. On the other hand, 
in the ORT method, the data is first decomposed into the deterministic and stochastic 
components based on the preliminary orthogonal decomposition, and it seems that 
this decomposition is performed very well even if the noise is colored as long as the 
exogenous input satisfies a PE condition with sufficiently high order. For differences 
between the two algorithms; see Remark 9.3 and programs listed in Tables D.5 and 
D.7. 

Further simulation studies showing a superiority of the ORT method based on the 
preliminary orthogonal decomposition are provided in Section 10.7, in which results 
of three subspace identification methods will be compared. 
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9.9 Notes and References 


e Based on Picci and Katayama [130, 131], we have presented realization results 
for the stochastic system with exogenous inputs under the assumption that there 
is no feedback from the output to the input and the input satisfies a sufficiently 
high PE condition. The main idea is to decompose the output process into the 
deterministic and stochastic components, from which we have derived a state 
space model with a natural block structure. 


e In Section 9.1, projections in a Hilbert space are reviewed and the property of 
conditional orthogonality is introduced. Then Section 9.2 has formulated the 
stochastic realization problem in the presence of exogenous inputs, and Section 
9.3 discussed the feedback-free conditions in detail based on [24, 25,53]. 


e In Section 9.4, we have considered the PE condition of the exogenous inputs, 
and developed a method of decomposing the output process into the determin- 
istic component and the stochastic component. In Section 9.5, we have shown 
that a desired state space model can be obtained by combining realizations for 
the deterministic and stochastic components, resulting in a state space model 
with a block structure in which the plant and the noise model have independent 
parametrizations. 


e In Section 9.6, a theoretical analysis of deterministic and stochastic realiza- 
tion methods based on finite input-output data is made, and in Section 9.7, the 
ORT method of identifying the deterministic and stochastic subsystems are de- 
veloped by using the LQ decomposition and the SVD. Since the present algo- 
rithms are derived from the basic stochastic realization methods in the pres- 
ence of exogenous inputs, they are different from those of MOESP [171-173] 
and N4SID [164, 165]. Some numerical results are included in Section 9.8; for 
further simulation studies, see [29,30], [93], and for theoretical analyses of ill- 
conditioning of subspace estimates, see [31,32]. 

e In Section 9.10, proofs of Theorem 9.1 and Lemma 9.7 are included. 


9.10 Appendix: Proofs of Theorem and Lemma 
9.10.1 Proof of Theorem 9.1 
1° Proof of (4) > Gi). Rewriting (9.8), we have 
= [0° 26} | 
ul | 0 D(z)| | 
where vy and 7) are zero mean white noise vectors with covariance matrices Q, and 


Qo, respectively. By assumption, I"(z) is stable, so are A(z), B(z) and D(z). Also, 
we have 
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Also, '~+(z) is stable, so are A~!(z) and D~+(z). Thus, in particular, D(z) is of 
minimal phase. Hence, we see that u = D(z) is the innovation representation for 


u, and 77 is the innovation process for w. 
Define 


H = Span{n(r) | 7 = 0,+1,---}, KH, =span{n(r) | 7 < t} 


Since D(z) is of minimal phase, we have U = H and U,; = H; . Moreover, we get 
EB S Bin(t — 1) a = >)o Bin(t -4) 
i=0 i=0 
=E > Bin(t — #) i) 
i=0 


Noting that v L 4H, we have 


Efy(t) | Uj {oat 9+ Bilt —9 a 
i=0 1=0 
= S- Bin(t — 1%) 
i=0 
= {ames + S° Bin(t - 4) a) 
i=0 1=0 


as was to be proved. 
2° Proof of (ii) > (iii). From (ii), it follows that 


ya(t) = E{y(t) | Uni} = Yo Bint > ya = Bl2)n 


so that from u = D(z)n, we have 
ya = B(z)D~*(z)u =: K(z)u 


Also, from the stability of B(z) and the minimal phase property of D(z), we see that 
K(z) is stable. 
We define ¢(t) := y(t) — E{y/(t) | U;,,}. Then, 


=> Kwult-i) +60) 


i=0 


We show that C(t) is orthogonal to U. Since E{y(t) | Uni} = E{y(t) | UW} and 
Un, CUA, CU, h=1, 2, ---, we get 
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FEfy(t) | Una} = Ey) |W, kh = 1,2, 


It therefore follows that 


=F 2 Kj,u(t — i) + C(t) | un} (9.56) 


where h = 1, 2, ---. Note that 


E 1 K,u(t — 1) 
1=0 


Thus it follows from (9.56) that 
E{C(t) | Uj} = ELC) [Uap =0,  h=1,2,-- 


where ¢(t) 1 U;,, is used. However, since Uj, C Uj4,, h = 0, —1, +--+, we have 
C(t) LU, 2 =0, -1, ---. Thus it follows that 


E{C@) |Uz,}=0, &=0,41,-. 


This implies that ¢ is orthogonal to U. 
Recall that ¢ := y — yq is defined by the difference of two regular full rank 
stationary processes, so is ¢. Let the innovation representation of ¢ be 


Co 


6H) = 5° Liw(t-i), Lo=h 


i=0 


where v € R? is a zero mean white noise vector with covariance matrix (1, and 
where L(z) := $0? Liz~* has full rank and minimal phase. Hence, y can be ex- 
pressed as (9.10), where v is uncorrelated with u due to the fact that ¢ L wu. Since 
K(z) is stable, and since L(z) is of minimal phase, and L(z) has full rank, the proof 
of (iii) is completed. 

3° Proof of (iii) + (iv). Let the Hilbert space generated by the past of ¢ be given 
by 2, =Sspan{¢(r) | 7 < t}. Since 1 u, 


Up VY, =U, 82 
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From (iii), it follows that u(t +h) L 2, h =0, 1, ---, so that 
E{u(t +h) | Up V¥_} = Bfult +h) | Up & 2%} 
= E{u(t+h)|Up}, A=0,1,--- 
This implies that (9.12) holds. 


4° Proof of (iv) > (i). We define C(t) = y(t) — E{y(t) | U;,,}. From (iv), we 
have 


E{u(t+h)| Ur} = E{u(t +h) | Ur Vv 9e 
= E{u(tt+h)|Up ez}, h=0,1,-- 
Thus it follows that B{u(t+h) | 27} =0, h = 0, 1, «++. By the definition of ¢(t), 


E{u(t+h) | 27} =0, h=—1, —2, --- holds, implying that u is orthogonal to ¢. 
Let the innovation representation of ¢ be given by 


Co 


(H=S> Av(t-i), A= 


i=0 


where v is a zero mean white noise vector with covariance matrix Q,. Hence we 
have 


A 


=> Ajv(t - i) + 0 Kilt - 1) 
i=0 i=0 


u(t)= 5° Dnt-i),  t=0, 41,-- (9.57) 


where 7 € R™ is a zero mean white noise vector with covariance matrix Qo, where 
D(z) = 3725 Diz is of minimal phase. It follows from (9.57) that uw = D(z)n, 
so that 


y(t) = 5° Aw(t—1i) + S> Bin(t -3) (9.58) 
1=0 i=0 


where B(z) = K(z)D(z). From (9.57) and (9.58), I'(z) has a block upper triangular 
structure, implying that there is no feedback from y to wu. 


9.10.2 Proof of Lemma 9.7 


First we show that ae /~ is an oblique splitting subspace for (Y;*, U;). Clearly, 
from (9.27), any y € Y;' is expressed as 
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me eres a +/- 
g=9+6 gett, €ex7/ 


Since Be /~ is an oblique splitting subspace for (yt , U;,), and since € € eGR /~ is 
included in U; , we have 


Evut {y | Uy} = Evut {g | Ur} + Evut {§ | Ur} 
= Bye (0087) + Bye {EE} 


= Fut {y | ecm 


Hence 


Bg Bt Ur} = spam (Hpys (0 |Ur}| 7 < ¥) 


= SPAT (Bye (0 jar} | us wt) 


= Bus BF | 27" 


This proves the first statement. 
_ From (9.26) and (9.27), we have X//~ c Up and X}/~ c YF, so that X//~ C 
(Yi A U;) holds. Conversely, suppose that 7 € Y;>M Up holds. Then, we have 


qeuretiveas,. Hele 


Let 7 = m1 +72 where y, € yt and 72 € wie Cc U,. Since n € U;, we have 
m = — 2 € U; . However, since 71 € yr , it also follows from (9.26) that 


m = En tm | Ur} € xf/- 


Thus, we have 7 = 71 + 2 € ae This completes the proof of (9.28). 
We now prove (9.29). The following inclusion relation clearly holds: 


YF > (Yin Uz)+ (in Uy) (9.59) 


because the two sets in the right-hand side of the above equation are included in the 


set in the left-hand side. We show the converse. Since Yj° = yt Vv ee / ~, it suffices 
to show that 


9} CG uy) + Gin Uy 
es OUR ae) (9.61) 


SS 


(9.60) 


It follows from Lemma 9.4 that 


tth 
a(t+h)= So Gepn—vu(i) = y7 (t +h) + yf (t+) 


i=—0co 
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where 
yz (t+h)= pS Gitn—iu(t) € Up (9.62a) 
tth 
T(t+h)= 2, Gish ju(i) € Ut (9.62b) 


and where UM Uz = {0}. Thus yz (t + h) is the oblique projection of ya(t + h) 
onto the past Uy along the future U;', so that it belongs to a /~_ Thus we get 
UE he agel= = Y} NU; . Also, we have 

yt (t+ h) = yalt +h) -—yz(t +h) € ut 


Thus, from (9.62), the relation y+ (t + h) Cc YM Uf holds. Therefore, it follows 
that 


ya(t +h) =y7(t+h) t+yg(tth) € YF NU;) + YF nur) 


Since Span{ya(t + h) | h = 0, 1, ---} = YF, we have (9.60). Also, it follows that 
(9.61) trivially holds, since 5g = ue NU, from (9.28). 
This complete the proof of Lemma 9.7. 


10 
Subspace Identification (2) - CCA 


In this chapter, we consider the stochastic realization problem in the presence of 
exogenous inputs by extending the CCA-based approach. The oblique projection 
of the future outputs on the space of the past observations along the space of the 
future inputs is factorized as a product of the extended observability matrix and the 
state vector. In terms of the state vector and the future inputs, we then derive an 
optimal predictor of the future outputs, which leads to a forward innovation model 
for the output process in the presence of exogenous inputs. The basic step of the 
realization procedure is a factorization of the conditional covariance matrix of the 
future outputs and the past input-output given future inputs; this factorization can 
easily be adapted to finite input-output data by using the LQ decomposition. We 
derive two stochastic subspace identification algorithms, of which relation to the 
N4SID method is explained. Some comparative simulation results with the ORT and 
PO-MOESP methods are also included. 


10.1 Stochastic Realization with Exogenous Inputs 


Consider a stochastic system shown in Figure 10.1, where u € IR” is the exogenous 
input, y € R? the output vector, and € the stochastic disturbance, which is not ob- 
servable. We assume that {u(t), y(t), ¢ = 0, £1, ---} are zero mean second-order 
stationary stochastic processes, and that the joint input-output process (u, y) is of 
full rank and regular. 


Figure 10.1. Stochastic system with exogenous input 
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The stochastic realization problem considered in this chapter is the same as the 
one studied in Chapter 9, which is restated below. 


Stochastic Realization Problem 


Under the assumption that the infinite data {u(t), y(t), t = 0, £1, ---} are given, 
we define a suitable state vector 2 with minimal dimension and derive a state space 
model with the input vector u and the output vector y of the form 


a(t+ 1) = Ax(t) + Bu(t) + Ke(t) (10.1a) 
y(t) = Ca(t) + Du(t) + e(t) (10.1b) 


where e is an innovation process. 


In this chapter, we shall present a stochastic realization method in the presence 
of an exogenous input by means of the CCA-based approach. We extend the CCA 
method developed in Chapter 8 to the present case. Under the absence of feedback 
from y to u, we derive a predictor space for the output process y, leading to a minimal 
causal stationary realization of the process y with the exogenous process uw as an 
input. A basic idea is to derive a multi-stage Wiener predictor of the future output in 
terms of the past input-output and the future inputs, where an important point is to 
define an appropriate causal state vector. 

Let t be the present time, and & a positive integer. Then, we define the future 


vectors 
a an 
t+ u(t + 
fe ates 
y(t+k—1) u(t +k — 1) 
and the past vectors 
we m3 
t— ult — 
y-(t) = > u(t) = 
We further define 
w(t — 1) 
ere _ ful 
r(t) = w= [20] 


where w € R?, d:= p+ mis the joint input-output process. It should be noted that 
the future vectors f(t) € R*? and u;(t) € R*™ are finite dimensional, but the past 
input-output vector p(t) is infinite-dimensional. 
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The notation in this chapter is the same as that of Chapter 9. The linear spaces 
generated by the past of w, y and the future of wu are respectively denoted by 


Py, =span{w(r) | 7 <t} 
|r <t} 
|r >t} 


Y, = Span{y(r 
uy = span{u(r 


YS 


Also, we assume that these spaces are complete with respect to the mean-square 
norm ||2\|5¢ = /F{]|2||2}, so that P;, Yy and Uj are thought of as subspaces of 
an ambient Hilbert space H = U V Y that includes all linear functionals of the joint 
input-output process (u, y). 

Let B be a subspace of the Hilbert space H. Then the orthogonal projection of a 
vector a € H onto B is denoted by E{a | B}. If B is generated by a vector b, then 
the orthogonal projection is expressed as 


E{a|B} = E{ab™} Ef{bb" }*b 
= Sap D),b =: E(a | b) 
where ¥45 := Ef{ab"} is the covariance matrix of two random vectors a and b, and 


(-)t is the pseudo-inverse. Let B+ be the orthogonal complement of B C H. Then, 
the orthogonal projection of a onto B+ is given by 


E{a| B+} :=a—Ef{a| B} 


If B is generated by a random vector 6, then we write E{a | Bt} = E(a | b+). For 
the oblique projection; see also Section 9.1. 
We begin with a simple result on the conditional covariance matrices. 


Lemma 10.1. For three random vectors y, a, b € KH, we define the conditional 
covariance matrix 


Dyale = ELE(y | b+) E(a | b+)T} (10.2) 


Then, it follows that 
Syalb = LVya — Si Ss) Saba (10.3) 


where bp is assumed to be nonsingular. 


Proof. By definition, 
Bly | b+) =y— Zyo(Zo0) "6, — E(a | b+) = a — Sao(Ze0)7* 
Substituting these relations into (10.2) yields 


Xyalb os E{ly = Dyp( Xn) bl[a = Se(Qe) BA} 


Rearranging the terms, we get (10.3) [see also Lemma 5.1]. 
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Lemma 10.2. Consider three random vectors y, a, 6 € H and two subspaces A := 
span{a} and B := span{b} with AN B = {0}. Then, we have 


E{y | AV B} = Eyaty | A} + Byaly | BY 
=: T(y)a+ P(y)b (10.4) 


Also, define the conditional covariance matrices by 


Laals = E{E(a | b+)E(a|b+)"} 
Zyale = E{E(y | 6+) E(a | 6+)7} 
Lyla = E{By | a*)B(b|a*)"} 
Loola = E{E(b | a+) E(b | a*)™} 


Then, IT(y) and U(y) satisfy the discrete Wiener-Hopf equations 
IT(y) Soap = Syalbs Y(y) Xbo]a = Lybla (10.5) 


where we note that if aq and Xi, are positive definite, so are Xiqq\, and ipo \a- 


Proof. We see that the orthogonal projection of (10.4) is given by 


pty avny = mtuie® oE{[] tet on} [2] 


Thus it follows that 
= 


Ely | AV B} =[Sya Tyo] Bs a a 


—1 
Daa Da 0 
+ [Ey Dy] EB a | (10.6) 


We show that the first term in the right-hand side of the above equation is the oblique 
projection of y onto A along B. Recall the inversion formula for a block matrix: 


ea 


Putting A = ¥4,, B = CT = Ey and D = Nyy, we have A := A— BD-!C = 
Naa|b- Thus, from (10.4) and (10.6), 


A —A'BD7} 
=DGA> D+ DCA BD 


Daa Sab | .< 
a bb 


—1 
aa|b 
—1 1 
aby Dap pai 


aa|b 


= Saleen 


Ty) =[Eyo Ey EB 


= [Lya Xyp] 


= (Lya i Lyb Si Za) Doalp = aa|b 
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This proves the first relation in (10.5). Similarly, we can show the second relation 
holds. 
Now let v = Va for V € R®°*"¢. Note that v € A = span{a}. Then, we have 
T(v)a = LyalpUagpd = VLaalpLaqyp@ = Va=v 
Hence, I7(-) is idempotent on A. Also, putting z = Zb for Z € R®°*” and noting 
that Yyqj, = 0, we get 


T(z)a = sao Vaq 4 = 2 ba|0Zaq|o% = 0 


so that /7(-) annihilates any element in‘ B = span{b}. It therefore follows that IT (y)a 
is an oblique projection of y onto A along B. In the same way, we can show that 
W(y)b is an oblique projection of y onto B along A. 

Suppose that ¥7,_ and 2, are positive definite. Then, the positivity of the condi- 
tional covariance matrices ¥7,q\5 and Yp5)q is derived from AM B = {0}. In fact, if 
7 Zaa\s = 0,1 # 0 holds, then 


0= 97 Loin =n Ef{la— Bf{a| bya — E{a | by" }n 
= E{[y* (a — E{a| b})P} 


This implies 7’ a = n™ Ea | b} = 7 Sap 5,6 € B, a contradiction. This proves 
the positivity of 1,4). Similarly, we can prove the positivity of Yyp)4- 


10.2 Optimal Predictor 


In this section, we shall consider the prediction problem of the future f(t) by means 
of the past input-output p(t) and the future input w4+(¢). In the following, we need 
two assumptions introduced in Chapter 9. 


Assumption 10.1. There is no feedback from the output y to the input u. 


Assumption 10.2. For each t, the input space U has the direct sum decomposition 


U=U; + U (ie: = f04) (10.7) 


This is equivalent to the fact that the spectral density of the input u is positive definite 
on the unit circle, i.e., 


Pin(w) > clm, dec>0 (10.8) 


holds. This is also equivalent to the fact that the canonical angles of the past and 
future are positive. 


As mentioned in Chapter 9, the condition of Assumption 10.2 may be too strong 
to be satisfied in practice; but it suffices to assume that the input has a PE condition 
of sufficiently high order and that the real system is finite dimensional. 

The following theorem gives a solution to a multi-stage Wiener problem for pre- 
dicting the future outputs based on the joint past input-output and the future inputs. 
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IIp(t) 
Figure 10.2. Oblique projection 


Theorem 10.1. Suppose that Assumptions 10.1 and 10.2 are satisfied. Then, the 
optimal predictor of the future f(t) based on the past p(t) and future u+(t) is given 
by 

F(t |t) = ELF) | Pp V UP} = Mplt) + Yur) (10.9) 
where II p(t) denotes the oblique projection of f(t) onto Py along Uj, and Wut) 
is the oblique projection of f(t) onto Uf along Py as shown in Figure 10.2. More- 
over, IIT and © respectively satisfy the discrete Wiener-Hopf equations 


HS = Depp Leap = pale (10.10) 


Proof. If we can prove that P7 NU; = {0}, then we see from Lemma 10.1 that the 
orthogonal projection Ef f(t) | Py V Us} is given by the direct sum of two oblique 
projections. Thus it suffices to show that Uf N Py = {0}. 

Let ¢ € Uf MP. Then, we have ¢ € Uf and¢ € Pp = Y; V U;. From the 
latter condition, there exist 7 € Y,; andy € U; such that ¢ = 7 + v. Since there is 
no feedback from y to u, it follows from (9.13) that 


EY, |W= EY, | Up},  +=0, +1,--- 


Hence, the orthogonal decomposition of 7 € Y;, into the sum of deterministic and 
stochastic components gives 7 = na + 7s, where 


na = E{n|Us=E{n| Ur}, ne = E{n| Ut} 


Thus, it follows that ¢ = (v + ma) +1s.¥ +a € Uz, ns LU. Since ¢ € Uj, we 
get But tS | U;, } = 0, implying that 


Evus{ + na) +n | Up} =0 


However, since v + na € U;, and sincen, L U;, 


Eyal +a) +s | Ul} = Byte +a | Up} =v +a = 0 


Thus, ¢ € U; satisfies € = 7, 1 U, so that ¢ = 0, as was to be proved. 
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For convenience, we put an index k to denote that the matrices JJ € IR’?*°° and 
W e Reexk™ have k block rows, so that we write them J; and Y%, respectively. 
Thus, the terms in the right-hand side of (10.9) are expressed as 


Typ(t) = Bye {FO | Pr} (10.11) 


and ; 
Dus(t) = Evo {F() | UP} (10.12) 


Next we show that by using the oblique projections of (10.11) and (10.12), we can 
construct a minimal dimensional state space model for the output y which is causal 
with respect to the input uw. By causal here we mean that the oblique projection of 
(10.12) is causal, so that the operator YW, has a block lower triangular form. 

The following lemma will give a representation of the orthogonal projection in 
Theorem 10.1. 


Lemma 10.3. Suppose that Assumption 10.1 is satisfied. Then, from Theorem 10.1, 


we have 
Exy(t +h) | Pp VUE} = E{yt+h) | Pe V Up ern} (10.13) 
where h = 0, 1, +++, and where Up t4n] = span {u(t), «++ , u(t + h)}. Also, WY, is 
given by 
Go 
Gi Go 0 
vy, =| G2 Gi Go € RePxkm (10.14) 
(chee eee G, 
where (Go, G1, «--) are impulse response matrices. Hence, Yj, becomes causal. 


Proof. First, we show the following relation for the conditional orthogonality: 
ALBIC => ALB|(ArvVeC), ACA (10.15) 


Indeed, since Ag C A, we have A 1 B| C = Ag 1 B | C. Thus two relations 
E{B| AVC} = E{B | C} and E{B | Ao VC} = E{B | C} hold from Lemma 
9.2. This implies that 


E{B | AV C} = E{B | C} = E{B| Ay V C} 


However, since the left-hand side can be written as E{B | A V (Ao V ©)}, it follows 
that E{B | AV (Ao V C)} = E{B | Ao V C}, as was to be proved. 
Since there is no feedback from y to u, we see from Theorem 9.1 (iv) that 


brie reel 0 oie reel eo eer h=0,1,---: (10.16) 
Putting Ag = ¥,, A= Deena B= Saran and C = Un aap it follows from 


(10.15) that 
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Wag EU ame Vila) (10.17) 


Also, putting B = Det nai: AY = bereeee C= 4; V Un na in (10.17), it follows 
from Lemma 9.2 that 


or eae |U, VUP= Bees [Yr V Up nat 


Moreover, noting that y(t + h) € Yy thai We have 
E{y(t +h) |¥, VUZ = E{y(t +h) | U,V Un nait 
= E{y(t+h) | Pp v Ure, e+n} 


This proves (10.13). 
We show that Y%, is a causal operator. In fact, from (10.13), 


Bye ty [Meal 
D,u,(t) = Eup; {y(t oN | Un, tay} 
E\ip {y(t + & A 1) | Upe r+e—a)} 
Since Up: s+] = span {u(t), --- , u(t + h)}, we have 
Fyo-{y(t +h) | Up, evn} = Gault) + Grint + 1) +++» + Gou(t + h) 


so that YW, becomes a block Toeplitz matrix. The stationarity of Y%, follows from the 
stationarity of the joint process (u, y). 


In the next section, we shall define the state vector of the system with exogenous 
inputs in terms of the conditional CCA technique, where the conditional canonical 
correlations are defined as the canonical correlations between the future and the past 
after deleting the effects of future inputs from them. 


10.3 Conditional Canonical Correlation Analysis 


As shown in Chapters 7 and 8, the stochastic system without exogenous inputs is 
finite dimensional if and only if the covariance matrix of a block Hankel type has 
finite rank. It may, however, be noted that Lf plu is not a block Hankel matrix as 
shown in (10.18) below. 

We introduce the conditional CCA in order to factorize the conditional covari- 
ance matrix 2’,,),, of the future and past given the future inputs. By stationarity, 
»'fp|u is a kp x oo semi-infinite dimensional block matrix whose rank is non- 
decreasing with respect to the future prediction horizon k. We then define a state 
vector to derive an innovation model for the system with exogenous inputs. 
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Suppose that the conditional covariance matrix ’7,),, has finite rank, so that we 
assume rank(5’¢,),) = n. It follows from Lemma 10.1 that the conditional covari- 
ance matrix is expressed as 


Lyplu = EX{[F(®) — E{F(O) | UP Hip) — E{p@) | ut }7} 
= E{E(f | ux)(t)(E(p | ug) ()"} (10.18) 


Also, the conditional covariance matrices 2’r ¢|,, and XY’, 
the Cholesky factorizations be given by 


p|u are defined similarly. Let 


pe T 
Sega big Sea 


pp|u 


and define 


e4(t) := LE(f | ut)(0), e_(t) := M71E(p | ut)(t) 


Then, we have 
Efes (tet ()} = LD pp). M—* 


where the right-hand side of the above equation is the normalized conditional covari- 
ance matrix. 
Consider the SVD of the normalized conditional covariance matrix 


ESM SHU sy (10.19) 
where UTU = I,,,V'V = I, and where ¥ isa diagonal matrix of the form 
3) = diag(o1, ++: , On), 1>0,>:::>a,>0 
We define two n-dimensional vectors as 


a(t):=V™IME(p|ut)@), 6) :=UTL*E(f | ut) ~~ (10.20) 


Then it can be shown that 
Ef{a(tja™ (t)} = E{6Q)6"®}=In,  E{BMat(t)}=5 


Thus, comparing with the definition of canonical correlations in Subsection 8.5.1, 
we see that 01, +++ , O» are conditional canonical correlations between the future 
f(t) and the past p(t) given the future input u(t). Also, a(t) and §(t) are the 
corresponding conditional canonical vectors. 

According to the method of Subsection 8.5.1, the extended observability and 
reachability matrices are defined by 


Op = LUE? Coo = S/2vT mut (10.21) 


where rank(0,) = rank(C,,) = n. Thus from (10.19), the conditional covariance 
matrix 2’s,|,, has a factorization 
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Sig = (LUE PVM aS Opes 
Define the n-dimensional state vector as 
a(t) = CoD 5), p(t) = SVM" p(t) (10.22) 


Then, it can be shown that x(t) is a basis vector of the predictor space 
Xe = Ble |} (10.23) 


In fact, it follows from (10.11) and (10.22) that the oblique projection of the future 
f(t) onto the past P; along the future inputs U," is given by 


Byat (FO | Pe} = Mevlt) = Eppa Zt le) = Onal?) (10.24) 


PP 


where the covariance matrix of x is positive definite. Indeed, from (10.22), 
E{a(t)e7 (t)} = 3V?VI M3, MTV EV? 


BVM Sige Vo SO 
Note that if there are no exogenous inputs, the state covariance matrix is exactly the 
canonical correlation matrix as discussed in Section 8.5. 

Using the state vector x defined above, the optimal predictor of (10.9) is then 
expressed as 


f(t | t) = Oga(t) + Brus (t) (10.25) 


This equation shows that given the future input u(t), the state vector x(t) carries 
information necessary to predict the future output f(¢) based on the past P; . 
The property of x defined by (10.22) is summarized below. 


Lemma 10.4. Given the future inputs, the process {x(t), t = 0, +1, ---} defined 
above is a Markov process satisfying 


Byup {ae +b) | Pr} = Hyg folet A) (ME, ha 12 
where sae is the predictor space defined by (10.23). 
Proof. Rewriting the formula (10.25) for t > t + h yields 
gt+h|t+h) 
gt+h+1|t+h) 
’ = Oge(tt+h)+%us.(t+ h) (10.26) 
g¢t¢+h+k—-1|t+h) 


where §(1 |t+h),l =t+h,--- denotes the optimal estimate of y(1) based on the 
observations up to time ¢ + h — 1 and the inputs after t + h. Also, k > k +h in 
(10.25), 
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g(t | t) u(t) 
iG Pat Ds | S021) 
j(t+h|t) = On4nu(t) + he u(t-+ h) (10.27) 
g(t+h+k-1|?) thee’ 


Since ©, has full column rank, the last & block rows of On+r € R*PX” is written as 
Ona = OvAn, A, € R"™” 


Also, the last k block rows of W,4; is expressed as 


Groot Go 0 
Geran Go 
Var = ; 
Gh4k—1 rere Gh, Ray BA Go 


Hence, we can write the last k block rows of (10.27) as 


gt+h|t) u(t) 
gt+h+1|t) u(t + 1) 
= O,Ana(t) + Var (10.28) 
git+h+k—-1|t) ut+h+k—-1) 


From the definition of f(t | t) and the property of oblique projection, it can be 
shown that 


Bvt (Ot htl[t+h) | PF} = Bye lGE+h+1| t) | Pr} 


holds for / = 0,1,---. Thus, by applying the operator Ev { - | P, } on both sides 
of (10.26) and (10.28), we have 


On Birt {a(t +b) | Po} = Oe An But {2 | Po} = Ov Ane(t) 
Since O, has full rank, it follows that 
Eng {ult +h) | Po} = Ana(t) € 0f/- 
Also, applying the operator Evutf als / ~} to the above equation yields 
A A —_ + /— a a 
Byuce { Byug felt + 8) | Pe} | LF} = Anal) = Hyg fale +) | Pr} 


where the left-hand side of the above equation reduces to Evut {a(t +h) | Xf / “4 


by using Dg ae ;, . This completes the proof. 
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Remark 10.1. The state vector defined by (10.22) is based on the conditional CCA, 
so that it is different from the state vector defined for stationary processes in Subsec- 
tion 8.5.1. According to the discussion therein, it may be more natural to define the 
state vector as 

a(t) = 31/2a(t) = Cee jw E(P | u2-)(t) 


in terms of the conditional canonical vector a(t) of (10.20), where cov{a°(t)} = XY. 
It should, however, be noted that we cannot derive a causal state space model by using 
the above x(t). In fact, defining the subspace P> := E{P7 | (Uj)+}, it follows 
that PV = Pp, ® U;. Thus we obtain the following orthogonal decomposition 


ft] t) = ELF | PP V Ul} = ELF | Pp @ Ur} 
= B{f(t)| Pr} + EL F(t) | Ut} 


From P> = span{E(p | ut)(t)}, the first term in the right-hand side of the above 
equation becomes 


EL f(t) | Pr} = E{ FOE} | uyp)OI"} 
x (cov{E(p | uz) (t)}) * B(p | uz) (t) 
= ZpplulZpplu) -E | ux )(t) = Ona (t) 


Thus, though similar to (10.25), we have a different optimal predictor 
f(t | t) = Oxa®(t) + Gus (t) (10.29) 
where Y%, is a non-causal operator defined by 
Deus(t): = ELF) | US} 
= Ef f(tjus(t)" (Efus(Qus (0) "Pus (6) 


This implies that being not a causal predictor, 7°(t) of (10.29) cannot be a state 
vector of a causal model. 


We are now in a position to derive the innovation representation for a stochastic 
system with exogenous inputs. 


10.4 Innovation Representation 


In this section, by means of the state vector x of (10.22) and the optimal predictor 
f(t | t) of (10.25), we derive a forward innovation model for the output process y. 

Let U, = span{u(t)} be the subspace spanned by u(t). From Lemma 10.3, 
showing the causality of the predictor, the first p rows of (10.25) just give the one- 
step prediction of y(t) based on Py V Uz, so that we have 
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H(t) = Ely(t) | Pr Vv Ur} 
= E{y(t) | Pr V Ue} = Ca(t) + Du(t) (10.30) 


where C' € R?*” and D € R°*™ are constant matrices. Since, from the proof of 
Theorem 10.1, P; MU; = {0}, the right-hand side of (10.30) is a unique direct sum 
decomposition. Define the prediction error as 


e(t) = y(t) — E{y(t) | Pp VU} (10.31) 
Then, the output equation is given by 
y(t) = Ca(t) + Du(#) + e(t) (10.32) 


Since the projection E{y(t) | P7 V U;} is based on the infinite past and (u, y) are 
jointly stationary, the prediction error e is also stationary. Moreover, from (10.31), 
the prediction error e(t) is uncorrelated with the past output {y(f—1), y(¢—2), ---} 
and the present and past inputs {u(t), u(t—1), ---}, sothate(t) L (¥p V Uj). 
Since e(t + 1) 1 (Yi,1 V Uj,g), it follows from (10.31) that 


e(t) € Pay = Yiu V Una) C Yin V Unga) 


This implies that e(t + 1) L e(¢), and hence e is a white noise. 
Now we compute the dynamics satisfied by x(t). To this end, we define 


w(t) = a(t +1) — Bfe(t +1) | ¢/- Vv Uy} (10.33) 


where ie = span{(t)}. Since at Cc P; , and hence sii NU: = {0}, the 
second term of the right-hand side of the above equation can be expressed as a direct 
sum of two oblique projections. Thus there are constant matrices A € R”*” and 
Be R”*” satisfying 


E{a(t +1) | 0f/" v UW} = Aa(t) + Bu(t) (10.34) 
Thus the state equation is given by 
a(t +1) = Ax(t) + Bu(t) + w(t) 


Finally, we show that w in the right-hand side of the above equation is expressed 
in terms of the innovation process e. 


Lemma 10.5. The prediction error w(t) is a function of e(t). In fact, there exists a 
matrix K€ R°*? such that 

w(t) = Ke(t) (10.35) 
where K = E{w(t)e'(t)}(covfe(t)})71. 
Proof. Since x(¢+ 1) is a function of {y(¢), u(t), y(t— 1), u(t—1),--- }, it follows 
from (10.34) that 
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Aa(t) + Bu(t) = B{a(t +1) | x¢/7 Vv Uw} 


From Lemma 10.4 (with h = 1), the first term in the right-hand side of the above 
equation is given by 


Byutalt +) | Xf/7} = Byu, {et +1) | PF} = Ar) (10.36) 
Define Evp- {x(t +1) | Uz} := Bi u(t). Then, it follows that 
E{a(t +1) | PF V Us} = Aa(t) + By u(t) 
Since (= VU) C (Py V Ub), we obtain 
Elo + 0) (xt vu} = BL Balt +1) | Pp VU} | at/—y ws} 
The left-hand side is Ax(t) + Bu(t), while the right-hand side is 
E{Ac(t) + Byu(t) | Cf/~ VU} = Az(t) + Biu(t) 
so that By u(t) = Bu(t) holds for any u(t), implying that By = B. Thus we have 
Ey pi {at +1) | U:} = Bu(t) = Byp— {et +1) | Us} (10.37) 
Combining (10.36) and (10.37) yields 
E{a(t +1) | Xf!" VU} = Bu, {alt + 2) | Po} + Hyg {a(t + 1) | Us} 
= E{a(t+1) | Pp VU} 
= E{2(t+1)|¥p V Un} 
Thus from (10.33), w(t) is also expressed as 
w(t) = a(t +1) — B{x(t +1) | Ye VURAL} 
so that w(t) € Py, is orthogonal to Yr V Uj. 
On the other hand, the subspace P;, , can be expressed as 
Py: = PAM(y(t), ve- DV), «+ lf), wee -D, --} 
= Spam{e(t), y(t— 1), +++; u(t), w(t— 1), «+ } 
= span{e(t)} @ (Yp V Up4) 


It therefore follows that w(t) is expressed as a function of e(t). The matrix K is 
obtained by w(t) = E{w(t) | e(t)} = Ke(t). 
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Summarizing the above results, we have the main theorem in this chapter. 


Theorem 10.2. Suppose that Assumptions 10.1 and 10.2 are satisfied. If the rank 
condition rank(5'fp\u) = n holds, then the output y is described by the following 
State space model 


a(t+ 1) = Ax(t) + Bu(t) + Ke(t) (10.38a) 
y(t) = Ca(t) + Du(t) + e(t) (10.38b) 


This is a forward innovation model with the exogenous input u, and the state vector 


z is an n-dimensional basis vector of the predictor space Oa given by (10.23). 


Thus, it follows from (10.38) that the input and output relation is expressed as in 
Figure 10.3, where 


P(z)=D+C(zI- A)'B 
H(z) =1,+C(zI—A)'K 


Since the plant P(z) and the noise model H(z) have the same poles, we cannot 
parametrize these models independently. This result is due to the present approach 
itself based on the conditional CCA with exogenous inputs. 


Figure 10.3. Transfer matrix model 


From the Kalman filtering theory, we see that all other minimal representations 
for the output y are given by 


z(t +1) = Az(t) + Bu(t) + Fo(t) (10.39a) 
y(t) = C2(t) + Du(t) + Jv(t) (10.39b) 


where v is a zero mean white noise with covariance matrix I, (q > p). The matrices 
F € R°*? and J € R*? are constant, and A, B, C, D are the same as those 
given in (10.38)!. Also, the state vector x of the innovation model of (10.38) is the 
minimum variance estimate for the state vector z(t) of (10.39), ie., 


w(t) = E{z(t) | Pp VUP} = E{e(t) | Pr} 


The relation between F and J in (10.39) and K and R := Ef{e(t)e™(t)} is already 
explained in Section 5.4. 


'See Subsection 7.3.1, where the size of spectral factors associated with Markov models 
are discussed. 
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10.5 Stochastic Realization Based on Finite Data 


In practice, the computations must be performed based on finite input-output data, 
and the construction of the innovation model should be based on the predictor of the 
future outputs obtained by the available finite data. 

Suppose that 7 < t < T and let ¢ be the present time. Let w(r),--- ,w(t— 1) be 
the truncated past vectors, and define the stacked vector 


and let P{,,4) denote the past data space spanned by the above vector p,(t). The 
symbol Ut, 7] denotes the (finite) future input history after time ¢. 
From these data we can form the finite-memory predictor at time ¢ as 


FG |t) = BLFO) | Pin V Up} 
= B{f(t | t) | Piet) V Upe,r]} 
= Eup nt fel) | Py} + Fionn (FELD | Upon} 
where f(t | t) is defined by (10.9). The following result, which we shall state without 


proof, explains the role of the transient Kalman filter in finite data modeling; see also 
Theorem 6 in [107] and Theorem 3 in [165]. 


Theorem 10.3. Suppose that Assumptions 10.1 and 10.2 are satisfied. If 3) fp). has 
rank n, the process y admits a finite interval realization of the form 


#,(t +1) = Az,(t) + Bu(t) + K(t)é,(t) (10.40a) 
y(t) = C#,(t) + Du(t) + é,(t) (10.40b) 
where the state vector £,(t) is a basis in the finite memory predictor space 
Xi = Eg! | Pir, t) V Up, 7) } 


and the process {é,(t),7 <t < T} is the transient innovation of the output process 
{y(t),7 <t < T} with respect to Pi, 4) V Ue, 7). 

Proof. The result is proved by applying the Kalman filter algorithm to (10.39). See 
the finite interval realization of [106, 107]. 


We briefly make a comment on the non-stationary realization stated in Theorem 
10.3. We see that any basis £,(t) € X, has a representation 


é,(t) = E{2(t) | Pin VUpr}, t>7 (10.41) 
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where x(t) is a basis in the stationary predictor space X/ /~ and hence , (t) is also 
the transient Kalman filter estimate of z(t), the state vector of (10.39), given the data 
Pir,t) V Up, 7. The initial state for (10.40a) is 
#,(r) = E{2(r) | Up, r}} 

and the matrices A, B, C’, D are the same as those in (10.38). 

We define the error covariance matrix of the state vector z(t) of the system 
(10.39) as es 

P(t) = E{z@) - & ()) 2® -&- OI] } 
It thus follows from (5.66) that P(¢) satisfies the Riccati equation 
P(t+1) = AP(t)AT — (AP(t)CT + FJT)\(CP(Q)CT 4+. JJ") 
x (CP(t)A' + JF") + FF" 
where P(r) = cov{z(r) — &,(r)}, and the transient Kalman gain is given by 
K(t) = (AP(t)CT + FJ7)(CP(t)CT + JJ7)7 


Also, if tT —+ —oo, the state vector ¢,(t) of the transient innovation model of 
(10.40) converges to x(t). Moreover, P(t) converges to a unique stabilizing solution 
of the ARE 


P= APA’ =(APO' + FJ \(CPC! 4777) 
x (CPA! + JF")4 FFT (10.42) 


and hence K(t) converges to K = (APC? + FJ')(CPCT + JJ*)71. 


Remark 10.2. The conditional CCA procedure of Section 10.4 applied to finite past 
data provides an approximate state vector #,(t) differing from x(t) by an additive 
initial condition term which tends to zero as T —> —oo. In fact, from (10.41), 


&,(t) = B| ee {x(t) | Pedy + Ey Pie gs {x(t) | Un rit (10.43) 


Recall from (10.24) that «(t) = Ot Ey Up.ry f(t) | Pe } holds. Then the first term 
in the right-hand side of the above equation is an oblique projection which can be 
obtained by the conditional CCA of the finite future and past data. 

Since x(t) € P; holds, the second term in the right-hand side of (10.43) tends to 
zero for T —+ —oo by the absence of feedback, and hence the oblique projection of 
x(t) onto the future Uj, 7) along the past P(,,4) tends to the oblique projection along 
P, , which is clearly zero. 
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10.6 CCA Method 


In this section, a procedure of computing matrices [7;, and Y, based on finite input- 
output data is developed. A basic procedure is to compute approximate solutions of 
the discrete Wiener-Hopf equations of (10.10), from which we have 


Lie i, gy ee (10.44) 


We 2 pulp Diag) CPE (10.45) 


Once we obtain /7;, and YW, subspace identification methods of computing the system 
parameters A, B, C, D, K are easily derived. 

Suppose that finite input-output data u(t), y(t) fort = 0,1,---,N + 2k —2 
are given with N sufficiently large and & positive. We assume that the time series 
{u(t), y(t)} are sample values of the jointly stationary processes (u, y) satisfying 
the assumptions of the previous sections, in particular the finite dimensionality and 
the feedback-free conditions. In addition, we assume throughout this section that the 
sample averages converge to the “true” expected values as N — oo. 

Recall that d = p +m, the dimension of the joint process (u, y). Define the 
kd x N block Toeplitz matrix with N columns 


wik—1) w(k) -«-- wik+N-—2) 
w(k — 2) w(k—-1)--- wik+N-—3) 


Wole—1 = : . c REdxN 
w(0) wl) ++) w(N-1) 


where Wo 4-1 denotes the past input-output data. Also, define block Hankel matrices 


uk) u(k+1)--- u(kk+N-—1) 


tice 1) ee uth : NYS, | ecaeasene 
u(2k—1) u(2k) ++ u(N + 2k —2) 
and 
yk) y+ l)--) yk+N-1) 
wie y(k i 1) y(k 2)--» — y(k i N) e Rex 


y(Qk—1) y(2k) - y(N + 2k —2) 


where Ujjox—1 and Yj)2,—1 denote the future input and the future output data, re- 
spectively. 

In the following, we assume that the integer k is chosen so that k > n, where n 
is the dimension of the underlying stochastic system generating the data. Also, we 
assume that the input is PE with order 2k, so that Uoj2,—1 has full row rank. Consider 
the following LQ decomposition: 
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Un|2e—1 Re 0 0 gt ; 
Wor-1 | = | Rar Roo 0 Q!)| =:RQ* (10.46) 
R31 R32 R33 | | QF 


1 
VN 
Yhjoe—1 
where Ry, € R&™**™, Roo € REEXk4) Rog © IREPXkP are block lower triangular, 
and matrices Q; are orthogonal with Q/ Q 3 = 16;;. 
By using (10.46), we have 


Ste ig lad : Ugjor—1 | | Uejor—1 
ine Lipp Al = 7 | Wo-1 | | Wow} = RR" 
Ypu “tp U tf Ypjor—1 | L Yej2e—1 


It therefore follows that X’,,,, = Ri ee Spu = Rai Riis Sty = Rai Bs and 
Spp = RuRy + RaRyp, Dipp = Rai Rg, + Ra2P3o 
“pp = Roi RZ, + R32 R3q + R33R35 


From the definition of conditional expectation of Lemma 10.1, we get 


Dy plu = Lee — LpuLign Lug = Roa Rgo + Ras R3y (10.47a) 
t= = Dis Dig = Ra (10.47b) 
Tic lo a = RS (10.47c) 
L'fulp = Leu — EfpUpp Lpu 

= Ra RY, — (Rai Ry + R32Rd) 55, Ra Ry (10.474) 
Mia SS a pe 

= Rui Rt} - Rukp, 5, Rukh (10.47e) 


It should be noted here that %,,,, is positive definite by the PE condition. Also, we 
assume that 4’, and X,,),, are positive definite. 


Lemma 10.6. In terms of Rj; of (10.46), IT, and Y;, are respectively expressed as 
TT, = R32R5y (10.48) 
UW, = (R31 — R32 RQ Ro1)Ry! (10.49) 


Proof. Since R},(R22R3,)~! = R>', (10.48) is obvious from (10.44). We show 
(10.49). It follows from (10.47d) and (10.47e) that 


Spulp = Rai Riy — (Rai Ry, + R32Ry)Z;, Ra Ri, 
= Rai (Tem — R355, Roi) Ry, — R32 R355, Ra Ry 
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and Suulp = R11 Lem — Rij Jp Rai) Ri respectively. From (10.45), 
Y= Eraip(@aula) 
= Rg Ry! — R32Rj, 55) Roi (Tem — RZ, Rai) Ry 
= Rs Ry’ — R32R32(Tkm — Upp Ra Ry,) 755, Ra Ry 
= Rg RU’ — R32R3(Zpp — Ror Ry) Roi Ry 
= (Rg, — Ray Rj (Roo R39) | Roi) Ryy 


The right-hand side is equal to that of (10.49). 


Comparing the LQ decompositions of (10.46) and (6.59), we see that IT;, Wo k—-1 
and YW, obtained in Lemma 10.6 are the same as € of (6.65) and Y% of (6.66), respec- 
tively. Thus, the present method based on the CCA technique is closely related to the 
N4SID method. The following numerical procedure is, however, different from that 
of the N4SID in the way of using the SVD to get the extended observability matrix. 

In the following algorithm, it is assumed that the conditional covariance matrices 
Lp flur Lpp|ur fplu Of (10.47) have already been obtained. 


Subspace Identification of Stochastic System — CCA Method 
Step 1: Compute the square roots of conditional covariance matrices” 


aT VL 
Dia Lh, B= aM 


pp|u 
Step 2: Compute the normalized SVD [see (10.19)] 
Yo De Susy! SUSY 
and then we get naan 
Spplu ~ LUSV'M™ 


where S is obtained by neglecting smaller singular values, so that the dimension of 


A 


the state vector equals dim(S). 
Step 3: Define the extended observability and reachability matrices by [see 
(10.21)] ae ated 
OeSiLUs, epee uM 
Algorithm A: Realization-based Approach 
Step A4: Compute A and C' by 


A=Oz(p+1: kp,1:n)'Oz,(1: (k — 1)p,1:n) 
C= O,(1:p,1:n) 


"In general, there is a possibility that 7 ‘flu and/or 3’,|. are nearly rank deficient. Thus 
we use svd to compute L and M rather than chol, and the inverses are replaced by the pseudo- 
inverses. 
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Step AS: Given A, C and %, of (10.49), compute B and D by the least-squares 
method 


Tp Opxn WY, (1: kp,1:im 

0 Op—1 K( DP, ) 

I, Opxn D Dy(p+1:kp,m+1:2m) 

0 Ox—2 B i 

T, 0, Y.((k —1)p+1: kp, (k—1)m +1: km) 
p Ypxn 


where 0; = Oz,(1: jp,lin),j <k. 


Remark 10.3. Comparing the LQ decompositions of (10.46) and (9.48), we observe 
that since the data are the same except for arrangement, R32 of (10.46) corresponds 
to [L42 43] of (9.48). Thus the construction of the extended observability matrix is 
quite similar to that of PO-MOESP [171]; see also Remark 9.3. It may be noted that 
a difference in two methods lies in the use of a normalized SVD of the conditional 
covariance matrix. 


We next present a subspace identification algorithm based on the use of the state 
estimates. The algorithm until Step 3 is the same as that of Algorithm A. 


Algorithm B: Regression Approach Using State Vector 
Step B4: The estimate of the state vector is given by [see (10.22)] 


Xp =O, O75) Wor = S1?2VI M4 Wop_1 € RN 


pp|u 


and compute matrices with N — 1 columns 
Xanga = Xn:,2:.N) X, = Xn(:,1:N—-1) 
Vie = Yaya: 1: N -1) Oyj, = UC, 1: N—-1) 


where X k+1, the state vector at time & + 1, is obtained by shifting xX , under the 
assumption that k is sufficiently large. 


Step B5: Compute the estimate of the system matrices (A, B, C’, D) by apply- 
ing the least-squares method to the following overdetermined equations 


ele le 2] let] +(e] 
Yee CD) (Une Pe 


where py € R"*(%—») and p. € R°*(N—) are residuals. 
Step B6: Compute the error covariance matrices 


ee | eS if eee rete | 


Peay, Dee N—1| pep, pepe 
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and solve the Kalman filter ARE [see (8.83)] 
P= APA? — (APC™ + Swe)(CPC™ + See) (APCT + Swe)? + Dww 
Then, by using the stabilizing solution, the Kalman gain is given by 
K =(APCT + Swe)(CPCT + Dee)! 


where the matrix A — KC is stable. 
A program of Algorithm B is listed in Table D.6. 


The above procedure is correct for infinitely long past data and N — ov. For, 
it follows from Lemma 10.5 that the exact relations ¥.. = A = Ef{e(t)e'(t)}, 
Sw = KE..K™, Swe = KS. should hold, so that the unique stabilizing solution 
of the Kalman filter ARE above exists and is actually P = 0. 

For the finite data case, these exact relations do not hold and the sample covari- 
ance matrices computed in Step B5 vary with k and N. However, under the assump- 
tion that the data are generated by a true system of order n, if N and & are chosen 
large enough with N > k, the procedure provides consistent estimates. It should be 
noted that the Kalman filter ARE has a unique stabilizing solution P > 0 from which 
we can estimate K’. This is so, because by construction of the extended observability 
matrix Ox, the pair (C’, A) is observable and the covariance matrix of residuals is 
generically nonnegative definite. 


10.7 Numerical Examples 


We show some numerical results obtained by the CCA method, together with results 
by the ORT and PO-MOESP methods. We employ Algorithm B, which is based on 
the use of the estimate of the state vector. 


Figure 10.4. Simulation model 


We consider the simulation model shown in Figure 10.4, from which the input- 
output relation of the system is expressed as 


y(t) = P(z)u,(t) + vi (0), up(t) = u(t) + ve(t) (10.50) 


where v; and vz are zero mean noises additively acting on the input and output 
signals, respectively. The plant P(z) is the same as the one used in Section 9.8, and 
is given by 
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Figure 10.5. Identification results by CCA 
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Figure 10.6. Identification results: (a2) MOESP and (b) ORT 


0.02752—4 + 0.05512~° 


12 ee eee 
(2) 1 — 2.34432-! + 3.081z2—-? — 2.52742-3 + 1.24152—-4 — 0.36862-5 


Case 1: We consider the case where a colored noise is acting on the output, Z.e., 
v1 is colored noise and v2 = 0. The plant input wu is a white noise with mean zero 
and variance o2 = 1, where the colored noise v; is an ARMA process generated by 
v1 = H(z)e, where the noise model is given by 


1 —0.22-! — 0.482-2 
Goya o> ee 
() == pasa pes 


and e is a zero mean white noise with variance a2, whose value is adjusted so that 
the variance of the colored noise becomes nearly 0? = 0.01. 

The Bode gain plots of transfer functions P(z) and H(z) are displayed in Figure 
9.8 in Section 9.8°. In the CCA and PO-MOESP methods, which are based on the 
innovation models, it is implicitly assumed that the plant and noise models have the 
same poles. However, note that the plant and noise models have different poles in 


>The simulation conditions of the present example are the same as those of the example 
in Section 9.8; see Figures 9.9(b) and 9.10(b). 
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the simulation model of Figure 10.4; this is not consistent with the premise of the 
CCA and PO-MOESP methods. In fact, as shown below, results by the CCA and 
PO-MOESP methods have biases in the identification results. 

As mentioned in Chapter 9, however, the ORT method conforms with this sim- 
ulation model, since the ORT is based on the state space model with independent 
parametrizations for the plant and noise models. Hence, we expect that the ORT will 
provide better results than the CCA and PO-MOESP methods. 

Taking the number of data N = 1000 and the number of block rows k = 15, 
we performed 30 simulation runs. Figure 10.5(a) displays the poles of the identified 
plant by the CCA method, where + denotes the true poles of the plant, and « denote 
the poles identified by 30 simulation runs. Figure 10.5(b) displays the Bode plots 
of the identified plant transfer functions, where the true gain is shown by the solid 
curve. 

For comparison, Figure 10.6(a) and 10.6(b) display the poles of the identified 
plants by the PO-MOESP and ORT methods, respectively. In Figures 10.5(a) and 
10.6(a), we observe rather large biases in the estimates of the poles, but, as shown in 
Figure 10.6(b), we do not observe biases in the results by the ORT method. Moreover, 
we see that the results by the CCA method are somewhat better than those by the PO- 
MOESP method. 

Case 2: We consider the case where both v; and vz are mutually uncorrelated 
white Gaussian noises in Figure 10.4, so that H(z) = 1. 

First we show that in this case the model of Figure 10.4 is reduced to an innova- 
tion model with the same form as the one derived in Theorem 10.2. It is clear that 
the effect of the noise vp on the output is given by P(z)ve, so that the input-output 
relation of (10.50) is described by 


SPOUT. PON i ‘| (10.51) 


where the noise model is a 1 x 2 transfer matrix L(z) = [1 P(z)]; thus the poles 
of the plant and the noise model are the same. Hence, the transfer matrix model of 
(10.51) is a special case of the innovation model (see Figure 10.3) 


y(t) = P(z)u(t) + H(z)e(t) 
where H(z) is a minimum phase transfer matrix satisfying 


jw jw ot 0 jw jw 
o2lm(el yf = Lie) | | Leh) =o} + afro)? 


with H(oo) = 1. 

The transfer matrix H(z) can be obtained by a technique of spectral factorization. 
In fact, deriving a state space model for the noise model L(z)v, and solving the ARE 
associate with it, we obtain an innovation model, from which we have the desired 
transfer function H(z). Thus, in this case, the model of Figure 10.4 is compatible 
with the CCA method. 
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(a) ORT (v1, v2 = white) (b) ORT (v1, v2 = white) 


Figure 10.8. Identification results by ORT 


Numerical results by the CCA and ORT methods are displayed in Figures 10.7 
and 10.8, respectively, where it is assumed that a? = 0.01, a3 = 0.09, oe = 1, and 
the number of data NV = 1000, the number of block rows & = 15. Since both v, and 
v2 are white noises, we do not see appreciable biases in the poles of the identified 
plants, though there are some variations in the estimates. This is due to the fact that 
the present simulation model is fitted in with the CCA method as well as with the 
ORT method. 

In the simulations above, we have fixed the number of data N and the number of 
block rows k. In the next case, we present some simulation results by the CCA and 
ORT methods by changing the number of block rows k. 

Case 3: We present some simulation results by the CCA and ORT methods 
by changing the number of block rows as k = 8, 10,15, 20, while the number of 
columns of data matrices is fixed as N = 4000. The simulation model is the same 
as in Case 2, where there exist both input and output white noises, and the noise 
variances are fixed as 07 = 1 and a3 = 0.09. 

We see from Figure 10.9 that the performance of identification by the CCA 
method is rather independent of the values of &. Though it is generally said that a 
sufficiently large k (> n) is recommended, the present results show that the recom- 
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Figure 10.9. Identification results by CCA 


mendation is not always true. We can thus safely say that the number of block rows 
k; should be chosen in relation to the number of columns NV. 

Figure 10.10 displays the results of identification by the ORT method. In contrast 
to the results by CCA method, the performance is improved by taking larger k,, while 
for small & we see some variations in the poles of identified plant. These results 
may be due to the fact that computation of the deterministic components by the LQ 
decomposition is not very accurate if k gets smaller. 

In this section, we have compared simulation results by using the CCA and ORT 
methods. We observe that the performance of CCA method is slightly better than 
that of PO-MOESP, where both methods are based on the innovation models. We 
also conclude that the performance of the ORT method is better than that of the CCA 
method, especially if we use a general noise model. 


10.8 Notes and References 


e In this chapter, we have described some stochastic realization results in the pres- 
ence of exogenous inputs based on Katayama and Picci [90]. As in Chapter 9, 
we have assumed that there is no feedback from the output to the input, and the 
input has a PE condition of sufficiently high order. 
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(c) ORT (N = 4000, k = 15) (d) ORT (N = 4000, k = 20) 
Figure 10.10. Identification results by ORT 


e In Section 10.1, after stating the stochastic realization problem in the presence 
of an exogenous input, we considered a multi-stage Wiener prediction problem 
of estimating the future outputs in terms of the past input-output and the future 
inputs. This problem is solved by using the oblique projection, and the optimal 
predictor for the future outputs is derived in Section 10.2. 


e In Section 10.3, by defining the conditional CCA, we have obtained a state vector 
for a stochastic system that includes the information contained in the past data 
needed to predict the future. In Section 10.4, the state vector so defined is em- 
ployed to derive a forward innovation model for the system with an exogenous 
input. 

e In Section 10.5, we have provided a theoretical foundation to adapt the stochastic 
realization theory to finite input-output data, and derived a non-stationary inno- 
vation model based on the transient Kalman filter. In Section 10.6, by means of 
the LQ decomposition and SVD, we have derived two subspace identification 
methods. A relation of the CCA method to the N4SID method is also clarified. 
In Section 10.7, some simulation results are included. 


In the following, some comments are provided for the CCA method developed 
in this chapter and the ORT method in Chapter 9, together with some other methods. 
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The earlier papers dealt with subspace identification methods based on the CCA 
are [100, 101,128]. The method of Larimore, called the CVA method, is based on 
the solution of /-step prediction problem, and has been applied to identification 
of many industrial plants; see [102] and references therein. 


In the ORT method developed in Chapter 9, as shown in Figure 10.11, we start 
with the decomposition of the output y into a deterministic component yg € U 
and a stochastic component y, € U+, thereby dividing the problem into two 
identification problems for deterministic and stochastic subsystems. Hence, from 
the point of view of identifying the plant, the ORT method is similar to determin- 
istic subspace identification methods, and the identification of the noise model is 
a version of the standard stochastic subspace identification method for stationary 
processes. 


0 U 
Ud 


Figure 10.11. Orthogonal decomposition 


On the other hand, the CCA method is based on the conditional canonical cor- 
relations between the future and the past after deleting the effects of the future 
inputs, so that this method is regarded as an extension of the CCA method due to 
Akaike [2,3], Desai et al. [42,43], and Larimore [100, 101]. 


11 


Identification of Closed-loop System 


This chapter discusses the identification of closed-loop systems based on the sub- 
space identification methods developed in the previous chapters. First we explain 
three main approaches to the closed-loop identification. Then, in the framework of 
the joint input-output approach, we consider the stochastic realization problem of 
the closed-loop system by using the CCA method, and derive a subspace method of 
identifying the plant and controller. Also, we consider the same problem based on the 
ORT method, deriving a subspace method of identifying the plant and controller by 
using the deterministic component of the joint input-output process. Further, a model 
reduction method is introduced to get lower order models. Some simulation results 
are included. In the appendix, under the assumption that the system is open-loop 
stable, we present simple methods of identifying the plant, controller and the noise 
model from the deterministic and stochastic components of the joint input-output 
process, respectively. 


11.1 Overview of Closed-loop Identification 


The identification problem for linear systems operating in closed-loop has received 
much attention in the literature, since closed-loop experiments are necessary if the 
open-loop plant is unstable, or the feedback is an inherent mechanism of the system 
[48, 145, 158]. Also, safety and maintaining high-quality production may prohibit 
experiments in open-loop setting. 

The identification of multivariable systems operating in closed-loop by subspace 
methods has been the topic of active research in the past decade. For example, 
in [161], the joint input-output approach is used for deriving the state space mod- 
els of subsystems, followed by a balanced model reduction. Also, based on a sub- 
space method, a technique of identifying the state space model of a plant operating in 
closed-loop has been studied by reformulating it as an equivalent open-loop identifi- 
cation problem [170]. In addition, modifying the N4SID method [165], a closed-loop 
subspace identification method has been derived under the assumption that a finite 
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Figure 11.1. A feedback system [48] 


number of Markov parameters of the controller are known [167]. And, a subspace- 
based closed-loop identification of linear state space model has been treated by using 
the CCA technique [34]. 

Figure 11.1 shows a typical feedback control system, where P(z), C'(z), and 
H(z) denote respectively the plant, the controller and the noise model, and where r is 
the exogenous input, wu the control input, and y the plant output, v the unmeasurable 
disturbance. A standard closed-loop identification problem is to identify the plant 
based on the measurable exogenous input r and the plant input u and output y. 

A fundamental difficulty with closed-loop identification is due to the existence 
of correlations between the external unmeasurable noise v and the control input wu. 
In fact, if there is a correlation between u and v, it is well known that the least- 
squares method provides a biased estimate of the plant!. This is also true for subspace 
identification methods. Recall that we have assumed in Chapters 9 and 10 that there 
is no feedback from the output y to the input wu, which is a basic condition for the 
open-loop system identification. 

We review three approaches to closed-loop identification [48, 109]. The area of 
closed-loop identification methods can be classified into three groups. 


1. Direct Approach Ignoring the existence of the feedback loop, we directly ap- 
ply open-loop identification methods to the measurable input-output data (u, y) 
for identifying the plant P(z). 

2. Indirect Approach Suppose that the exogenous input r is available for identifi- 
cation, and that the controller transfer function C'(z) is known. We first identify 
the transfer function T,,.(z) from r to the output y, and then compute the plant 
transfer function by using the formula 


Tyr(Z) 


EGl= T= GG 


(11.1) 

3. Joint Input-Output Approach Suppose that there exists an input r that can be 
utilized for system identification. We first identify the transfer functions T,,,(z) 
and T,,,(z) from the exogenous input r to the joint input-output (u, y), and then 
compute the plant transfer function using the algebraic relation 


'This situation corresponds to the case where Assumption A1) in Section A.1 is violated. 
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Tyr(2) 
Tur(z) 


P(z) = (11.2) 


We shall now provide some comments on the basic approaches to closed-loop 


identification stated above. 


It is clear that the direct approach provides biased estimates. However, since the 
procedure is very simple, this approach is practical if the bias is not significant. 
In order to overcome the difficulty associated with the biases, modified methods 
called two stage least-square methods and the projection method are developed 
in [49, 160]. The basic idea is to identify the sensitivity function of the closed- 
loop system by using ARMA or finite impulse response (FIR) models, by which 
the estimate % of the input wu is generated removing the noise effects. Then, the 
estimated input @ and the output y are employed to identify the plant transfer 
function using a standard open-loop identification technique. 


For the indirect approach, the knowledge of the controller transfer function is 
needed. However due to possible deterioration of the controller characteristics 
and/or inclusion of some nonlinearities like limiter and dead zone, the quality of 
the estimates will be degraded. Moreover, the estimate of P(z) obtained by (11.1) 
is of higher order, which is typically the sum of orders of T,,.(z) and C'(z), so 
that we need some model reduction procedures. There are also related methods of 
using the dual Youla parametrization, which parametrizes all the plants stabilized 
by a given controller. By using the dual Youla parametrization, the closed-loop 
identification problem is converted into an open-loop identification problem; see 
[70, 141, 159] for details. 


The advantage of the joint input-output approach is that the knowledge of the 
controller is not needed. However, the joint input-output approach has the same 
disadvantage as the indirect approach that the estimated plant transfer functions 
are of higher order. It should also be noted that in this approach we should deal 
with vector processes even if we consider the identification of scalar systems. In 
this sense, the joint input-output approach should be best studied in the frame- 
work of subspace methods. 


11.2 Problem Formulation 


11.2.1 Feedback System 


We consider the problem of identifying a closed-loop system based on input-output 
measurements. The configuration of the system is shown in Figure 11.2, where 
y € R® is the output vector of the plant, and wu € R” the input vector. The noise 
models H(z) and F(z) are minimum phase square rational transfer matrices with 
H (co) = I, and F'(co) = Im, where the inputs to the noise models are respectively 
white noises v € IR? and 7 € IR” with means zero and positive definite covariance 
matrices. The inputs r; € R?’ and rg € R” may be interpreted as the exogenous 
reference signal and a probing input (dither) or a measurable disturbance. 
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Figure 11.2. Closed-loop system 


Let the plant be a finite dimensional LTI system described by 
y(t) = P(z)u(t) + A(z)v(t) (11.3) 


where P(z) is the (p x m)-dimensional transfer matrix of the plant. Also, the control 
input is generated by 


u(t) = re(t) + C(z)[ri() — y(@)] + F(z)n() (11.4) 


where C'(z) denotes the (m x p)-dimensional transfer matrix of the LTI controller. 
We introduce the following assumptions on the closed-loop system, exogenous 
inputs, and noises. 


Assumption 11.1. AJ) The closed-loop system is well-posed in the sense that (u, y) 
are uniquely determined by the states of the plant and controller and by the exoge- 
nous inputs and noises. This generic condition is satisfied if I, + P(oo)C(oo) and 
Im + C(co)P(co) are nonsingular. For the sake of simplicity, it is assumed that the 
plant is strictly proper, i.e., P(oo) = 0. 


A2) The controller internally stabilizes the closed-loop system. 


A3) The exogenous input r := 7] € R? (d = p+ m) satisfies PE condition, 
2 


and is uncorrelated with the noise x := Bl € R*; thus ri(t), ro(s), v(r), n(c) 


are uncorrelated for allt, s, T, 0 € Z. 


In the following, we consider the problem of identifying the deterministic part of 
the closed-loop system, or the plant P(z) and controller C'(z), using the measurable 
finite data {11 (t), ro(t), u(t), y(t), t=0,1,-.-,N —1}. 


Remark 11.1. The identification of controller C'(z) may not be needed in applica- 
tions. However, if the identified controller agrees well with the known controller 
transfer function, this will be an evidence that the identification results are plausible. 
Also, there are many chemical plants which contain recycle paths of energy and ma- 
terials, so that the identification of closed-loop systems is very important from both 
theoretical and practical points of view. 
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The objective of this chapter is to obtain state space models of the plant P(z) and 
the controller C'(z) based on finite measurement data {r1(t), ro(t), u(t), y(t)} by 
using subspace identification methods. In the following, we present two closed-loop 
identification algorithms based on the CCA and ORT methods. The first one, based 
on the CCA method, is rather close to that of Verhaegen [170]. The second one, 
based on the ORT method, is quite different from existing closed-loop idnetification 
algorithms, including that of [170]. 


11.2.2 Identification by Joint Input-Output Approach 


In order to obtain state space models of the plant and controller in closed-loop, we 
use the joint input-output approach. 
Define the joint input-output process 


wis H eR? (11.5) 


It then follows from Figure 11.2 that these signals are related by 
w(t) = Twr(z)r(t) + Toy (2Z) x(t) (11.6) 
where T,,,,.(z) and T,,,.(z) are the closed-loop system transfer matrices defined by 


_ [Lyn (2) Tyra) ] _ [P@)Sil2)C() PC)Sil(2) 
Torte) = [F |=| Si(zC(2) Siz) | oe 


and 
Ty(2) Tyn(2)] [|  Solz)H(2) — P(2)Si(2)F(2) 
Fax 2) = Eee ms = Pease Si(2)F(2) | 
and where 


Si(z) = Im + C(2)P(2))-", — Solz) = (Ip + P(z)C(z))™* 


are the input and output sensitivity matrices, respectively. 

Recall that the feedback system is internally stable if and only if the four transfer 
matrices in (11.7) are stable. Since r and x are uncorrelated in (11.6), there is no 
feedback from w to r; hence we can employ open-loop identification techniques to 
estimate the transfer matrix T.,,(z) = [Twr,(z) Twro(z)], using measurements of 
the input r and the output w. 

In order to deal with a well-posed estimation problem, these transfer matrices 
should be uniquely obtainable from the overall transfer matrix T(z). It follows 
from (11.7) that P(z) and C'(z) are identifiable from 

POH TineOl te). ~Ce= 7 eee) (11.8) 


ure ure 


where the inverse exists because S;(z) is invertible. Hence, contrary to the indirect 
approach [167], we do not need the knowledge of the controller, nor the auxiliary 
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input needed in the method based on the dual-Youla parametrization approach. It 
should, however, be noted that in order that both P(z) and C(z) be uniquely identi- 
fiable from the data, in general we need to have both signals r; and rz acting on the 
system’. 

In addition to Assumption 11.1 Al) ~ A3), we need the following. 


Assumption 11.2. There is no feedback from the joint input-output process w to the 
exogenous input r. 


11.3 CCA Method 


In this section, we apply the CCA method developed in Chapter 10 to the closed- 
loop identification problem of identifying the plant and controller based on the joint 
input-output approach. 


11.3.1 Realization of Joint Input-Output Process 


It follows from Theorem 10.2 that the innovation model for the joint input-output 
process w with the input r has the following form 


a(t +1) = Aa(t) + [Bi | “4 pie k | (11.9a) 


T1 1 

ro(t) €2 

y(t) Ci 0 0 TY (t) et (t 

EY b x ) Do, Doo T2 (t) ea(t 

where the dimension of the state vector is generically the sum of the orders of the 

plant and controller (n = Np +n), and where D1, = 0, Diy = 0 from the condition 
P(co) = 0. 


We see from (11.9) that the transfer matrices from r; to u and from re to y, u are 
given by 


= A By _ A Bo _ A By 
tn = [ectoa | m= [eres t= Leto] oo 


Thus we have the following result. 


| (11.9b) 


Lemma 11.1. Suppose that a realization of the joint input-output process w is given 
by (11.9). Then, realizations of the plant and controller are respectively computed by 


A= ByDs, Cz Pee (11.11) 


P)=| on 0 


and 


The case where one of the two signals is absent is discussed in [89]. 
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C(z) = (11.12) 


De Cs Dy Dar 


A — By Dy C2|Bi — - 


Proof. Since P = T,,.,T—!, it follows from (11.10) that 


yr2~ ure? 
_ [| A|B.] [ A] Bo ]7’ 
r= late [apa 
_ + AoE FP 
~ 1Cy) 0 —D55 C2 Des 


A —B,D5,'C2 |BoD5y 
0 A— Ba D5y C2|BoD55 | 


lI 
rs | 


By the coordinate transform T' = F | , we obtain (11.11). Also, from the relation 


C =T7)Tur,, we can prove (11.12). 


ure 


It may be noted that the matrix D2. should be nonsingular to compute the inverse 
matrix above. This implies that the exogenous input rz must satisfy the PE condition. 
Let the state space models of the plant and controller be given by 


x(t +1) = Apx,(t) + Bpu(t) (11.13a) 
y(t) = Cyap(t) (11.13b) 
and 
x-(t+1) = A.x,.(t) + B.[ri(t) — y(t)] (11.14a) 
u(t) = ro(t) + C.a-(t) + D-[ri(t) — yd] (11.14b) 


where x, € IR”? and x, € R”¢ are the state vectors of the plant and controller, 
respectively. We show that the models of (11.11) and (11.12) are not minimal. 


Lemma 11.2. Suppose that realizations of the plant and controller are respectively 
given by (11.13) and (11.14). Then, the following realizations 


P(z)= ae A, | 0 | (11.15) 


and 


[ Arp 0 0) 
C(z) = | -B,Cp A. | Be (11.16) 


are input-output equivalent to realizations of (11.11) and (11.12), respectively. 
Hence, the reachable and observable part of non-minimal realizations are the state 
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space realizations of the plant and controller, respectively. 
Proof. Combining (11.13) and (11.14) yields 


Ap — BpDeCp BpCe|BpDe Bp 


(11.17) 


For simplicity, we define 


2.2) ape. 20 an hoa' | By Da By Gil) Ox, Oo 
4=|_sfo,a]) ® Pa=["°S] fe] =[- 


Then, we have 


ee eee 
Tyro(Z) = 7 ; Turg(z) = = 
yale) = [A (@) = (Ae 


A 
so that T7751 (z) = Is 
2 


ure 


—Bo . 
. Thus, it follows that 


A 0 _B — 
-1 = 2 tM 42 =i a 2 a 
S AS=|9 a4 Bits |: S Be |-5" C,S =[-Ci Ci] 
It therefore follows that 
A 0 —By = 
| 0 A+ B.C} 0 | = 2 =] 


20, "Gi 0 }-lere 


The right-hand side is equal to (11.15). Similarly, for a proof of (11.16), we can use 
A+ BoCs|B, 
Tur (z) 7 = 
C2 Dd. 
For the realization (11.15), it follows from Theorems 3.4 (ii) and 3.7 (ii) that the 
rank conditions 


uri 


| obtained from (11.17) and C(z) = Tyr.(z)Tr4 (2). 


oA 0. B; 


rank | BO, 2f=As0 


| < Np + Ne, z€ XA) 


and 
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eB — Ap 0 ] 
rank me zl ; a < Np + Ne, z€XA,) 
Pp 


hold, implying that there are n, pole-zero cancellations in the realization of P(z). 
Thus, this realization is unreachable and unobservable. Hence the reachable and ob- 
servable part of the realization (11.15) will be a relevant state space realization of the 
plant. For the realization (11.16) of the controller, it can be shown that there exist n, 
pole-zero cancellations. 


Since a strict pole-zero cancellation does not exist in the realizations of (11.11) 
and (11.12), which are identified by using finite data, we see that the dimension of 
the state space realizations are of higher dimension with n := np + n¢. It is therefore 
necessary to obtain lower order models from higher order models by using a model 
reduction procedure. This problem is treated in Section 11.5. 


11.3.2 Subspace Identification Method 


We describe a subspace identification method based on the results of Section 10.6. 
Let r1(t), ro(t), u(t), y(t), t =0,1,---, N + 2k —2 bea set of given finite data, 
where JN is sufficiently large and k > n. Recall that the exogenous inputs and the 
ie _ | ri(t) d _ | 9) d 
joint input-output w are defined as r(t) = 7 € R* and w(t) = u(t) € R’, 
where d = p+™m. 

Let & be the present time. Define the block Toeplitz matrix formed by the past 
data as 
w(k —1) wk) --- w(k+ N — 2) 
r(k —1) r(k) +--+ r(k+N —2) 

. el ee c RebaxN 
w(0) w(l)--- w(N—-1) 
r(0) r(1)--:  r(N -1) 
Similarly, the block Hankel matrices formed by the future of r and w are respectively 
defined as 


r(k) r(k+1)--- r(k+N-—1) 


ae r(k . 1) r(k : 2)e- rk ‘ ND) | cobain 
r(2k—1) r(2k) ++ r(N +2k—2) 
and 
wk) w(k+1)--- wik+N-1) 
hee matte) ae eae gs ee 


w(2k—1) w(2k) «+. w(N +2k—2) 
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We consider the LQ decomposition 


i eae 7 ES - : ] bal =: RQT (11.18) 
ema leere Ane ie | 


where Ry, € R®¢X*4, Roy © Rek4x2hd) Ro © RE4¢Xkd are lower block triangular 
and Q; are orthogonal. Then, the conditional covariance matrices are given by 


Eerie = R32R3o + R33R33, ay = Ro2Roo, Lewp|r _ R32 Rdo 


pp|r 


The following closed-loop subspace identification algorithm is derived by using 
Algorithm B of Section 10.6. 


Closed-loop Identification - CCA Method 


Step 1: Compute the square root matrices such that 
igi SOD Jpplp = MMT 
Step 2: Compute the SVD of a normalized covariance matrix by 
LO Supt) =USV os Usv" 
where S is obtained by deleting smaller singular values of S. 
Step 3: Define the extended observability and reachability matrices as 
O, = LUS/?, e, = §1?vT Mt 
Step 4: Compute the estimate of state vector by 
Xp = C523 Py = SVM Py 
and form the following matrices with N — 1 columns 
Xngi = Xn(:,2:.N), Xy, = Xi (3,1: N—-1) 
Waje = Wie, 1. N - 1) 


Step 5: Compute the estimates of the matrices (A, B, C, D) by applying the 
least-squares method to the regression model 


so) (88) [) (2 
Wak CD Ree Pe 
Step 6: Partition the matrices B, C’, D as 
C1 0 0 
B=[B, By], C= 5 D= 
[Pi Ba] a Ee a 


and compute the higher order models P(z) and C(z) of the plant and controller by 
the formulas (11.11) and (11.12), respectively. 

Step 7: Compute lower dimensional models by using a model reduction algo- 
rithm (see Section 11.5). 
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11.4 ORT Method 


In this section, we develop a closed-loop subspace identification method based on 
the ORT method derived in Chapter 9. 


11.4.1 Orthogonal Decomposition of Joint Input-Output Process 


As usual, we introduce Hilbert spaces generated by the exogenous inputs and by the 
joint input-output process, which are respectively denoted by 


R= span{r(r)|7=0,41,---},  W=spanfw(r)|7=0, £1, +++} 


We also define Hilbert subspaces spanned by the infinite past and infinite future of 
the various processes at the present time ¢ as 


R, :=Span{r(r) | 7 < t}, W, := Span{w(r) | 7 < t} 
and 
Ri := span{r(r) | r > t}, W7 := span{w(r) |r > t} 


These are all subspaces of the ambient Hilbert space H := R V W spanned by the 
observable input and output processes (r, w). 

Since there is no feedback from w to r, the future of r is conditionally uncorre- 
lated with the past of w given the past of r. From Theorem 9.1 (ii), this feedback-free 
condition is written as 


E{w(t) | R} = E{wt)|Rai},  +=0,41,-- (11.19) 


implying that the smoothed estimate of w based on r is causal. 
It follows from (11.19) that 


(t) 
(t) 


where R+ is the orthogonal complement of R in H, and w, is called the stochastic 
component of w. Similarly, 


E{w(t) | Ria} 
E{w(t) | R} = B{w(t) | R>} 


W 
W 


wa(t) = E{w(t) | R} 


is called the deterministic component of w. The deterministic component wza is the 
part of w that can be linearly expressed in terms of the exogenous input r. 

As in Section 9.4, we obtain the orthogonal decomposition of the joint input- 
output process w = Wa + Ws, Le., 


By a Be ie be (11.20) 
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where the deterministic and stochastic components are mutually uncorrelated, so that 
we see from Lemma 9.3 that 


E{w,(t)wg (r)} = 0, Vi,7 =0, +1,--- 


Applying this orthogonal decomposition to the feedback system shown in Figure 
11.2, we have equations satisfied by the deterministic and stochastic components. 


Lemma 11.3. The deterministic and stochastic components respectively satisfy the 
decoupled equations 


ya(t) = P(z)ua(t) (11.21a) 
ua(t) = re(t) + C(z)[r1 (t) — ya(t)] (11.21b) 
and 
ys(t) = P(z)us(t) + H(z)v(t) (11.22a) 
us(t) = —C(z)y.(t) + F(z)n(t) (11.22b) 


Proof. From (11.3), (11.4) and (11.20), we have 
ya(t) + ys(t) = P(z)[ua(t) + us(t)] + A(z)v(t) 
ua(t) + us(t) = ra(t) + C(z)[ri (1) — ya(t) — ys(t)] + F(2)n(t) 


Since v, 7, Ys, Us are orthogonal to &, the orthogonal projection of the above equa- 
tions onto R and R+ yields (11.21) and (11.22), respectively. 


We can easily see from (11.21) that 


| ya(t) 


11.23 
aig) ( ) 


Since the transfer matrices in the right-hand side of (11.23) are the same as those of 
(11.7), the transfer matrices of the plant and the controller can be obtained from a 
state space realization of the deterministic component wg. 

We can draw some interesting observations from Lemma 11.3 for the decoupled 
deterministic and stochastic components. 


1. We see that the realizations of deterministic and stochastic components can be 
decoupled, since the two components are mutually uncorrelated. It should be, 
however, noted that though true for infinite data case, the above observation 
is not true practically. This is because, in case of finite input-output data, the 
estimate of the stochastic component w, is influenced by the unknown initial 
condition associated with the estimate of the deterministic component wg as 
discussed in Section 9.6. However, the effect due to unknown initial conditions 
surely decreases for a sufficiently long data. 
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2. Suppose that P(z) and C(z) are stable. Then, we can apply the ORT method 
to the deterministic part (11.21) to obtain state space realizations of P(z) and 
C(z); see Appendix of Section 11.8. In this case, we also show that the noise 
models H(z) and F(z) can be obtained from the stochastic part (11.22). 


3. If P(z) and/or C'(z) is open-loop unstable or marginally stable, we cannot fol- 
low the above procedure, since the deterministic (or stochastic) subspace method 
applied to (11.21) yields erroneous results. For, it is impossible to connect the 
second-order stationary processes ug and yq (or us and y,) by an unstable trans- 
fer matrix P(z). Moreover, controllers in practical control systems are often 
marginally stable due to the existence of integrators implemented. In this case, 
we need the joint input-output approach as show below. 


11.4.2 Realization of Closed-loop System 


Suppose that for each ¢ the input space ® admits the direct sum decomposition 


RR a Re. Re Re SO 


An analogous condition is that the spectral density matrix of r is strictly positive 
definite on the unit circle, i.e., B,,.(w) > cla, Ac > 0 or all canonical angles between 
the past and future subspaces of r are strictly positive. As already mentioned, in 
practice, it suffices to assume that r satisfies a sufficiently high order PE condition, 
and that the “true” system is finite dimensional. 

Let W be spanned by deterministic sce guna wa. Let wt denote the subspace 
generated by the future wa(7), 7 = t, t +1, ---. According to Subsection 9.5.2, we 
define the oblique predictor subspace as 


Xe? SS Bee We Re) (11.24) 


This is the oblique projection of wt onto the past R; along the future R;', so that 


x /~ is the state space for the deterministic component. Clearly, if r is a white noise 
process, (11.24) reduces to the orthogonal projection onto R; . 

Let the dimension of the state space Ba /~ be n, which in general equals the sum 
of the orders of the plant and the controller. From Theorem 9.3, any basis vector 


a(t) € Xf /- yields a state space representation of wa(t), i.e., 


ta(t+1) = Arg(t)+[B, By] r(t) (11.25a) 


i-[2]o-[8, 8) oom 


where A € R”*”. Since P(z) is assumed to be strictly proper, we have Dy, = 0, 
Diz = 0. Also, from the configuration of Figure 11.2, Dog = I, and Do, = D-. 
hold. It therefore follows from (11.25) that the transfer matrices of the closed-loop 
system are given by 
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Hence, from (11.7), we have Ty,.(z) = P(z)Tur,(z) and Tur, (z) = Turs(z)C (2), 
so that the plant and the controller are computed by P(z) = see z) a (z) and 
C(z) = TZ) (z)Tur, (z), respectively. 


ur2 


Lemma 11.4. Let det Doz 4 0. Then, the (non-minimal) realizations of the plant 
and controller are respectively given by 


A — BoD;,'C>|B2D3,! 
P@)= sae ee (11.26) 
Gy 0 
and ; : 
A — ByD3,'C2|B, — B2D;,'D 
C(z) = ie 92 V2 1 ate 22 21 (11.27) 
Dees Dy3 Doi 


Proof. A proof is similar to that of Lemma 11.1. 


Remark 11.2. Lemma 11.4 is seemingly the same as Lemma 11.1. However, the 
subspace identification algorithm derived from Lemma 11.4 is different from the 
one derived from Lemma 11.1. For it is clear that the way of computing state space 
realizations is quite different in two methods. 


Since the transfer matrices P(z) and C'(z) obtained from the realization of the 
deterministic component of (11.25) are of higher order, we apply a model reduction 
technique to get lower order models. This will be discussed in detail in Section 11.5. 


11.4.3 Subspace Identification Method 


In this section, we present a subspace identification method based on finite data. The 
notation used here is the same as that of Subsection 11.3.2. Suppose that finite input- 
output data r;(£), ro(t), u(t), y(t) fort = 0,1,---, N + 2k — 2 are given with N 
sufficiently large and k > n. We assume that they are samples from jointly stationary 
processes with means zero and finite covariance matrices. 

Let Rojx—1, Rej2r—1 € R'¢x% be the block Hankel matrices generated by the 
past and the future exogenous inputs, and similarly for Wojz.-1, Wejox—1 € ReaxN 
Moreover, we define the block Hankel matrices 


— | Roje-1 | Wol-1 
Roj2r-1 a ee 5) Wojar—1 am bas 


and then the subspaces Rojox—1 and Woj2n-1 generated by Rojox—1 and Wojox-1, 
respectively. 

The first step of subspace identification is to obtain the deterministic component 
wa by means of the orthogonal projection 
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Went =E {Woj2n—1 | Roj2n—1 } (11.28) 


The following development is based on the argument of Section 9.7. 
To derive the matrix input-output equation satisfied by Wri 4-1 from (11.25), we 


C 0 0 
meee nxd We 1 dxn = dxd 
define B := [B, Bo] € R°"%*,C = a] er 2D lon Do €F 


and the extended observability matrix 
C 
CA 
O, = : anes. k>n 
CAr-1 
and the lower triangular block Toeplitz matrix 
D 
CB D 0 
w,-| CAB CB D eesti (11.29) 
CA’ "BCA Ba 3CR D 
Then, it follows from (11.25) that 
Wren = ONE + We Reon (11.30) 


where 
Xf =[ea(k) ca(k+1) ++» ta(k+N—-1)]) € R"*% 


By using Lemma 9.8, the matrix We oes a part of Woot defined by (11.28), 
satisfies the same equation as (11.30), z.e., 


Wihoe—1 = OX + Reon (11.31) 


[see (9.44)], where the state vector is given by X¢d:= E{x¢ | Rojor—1}- 
Motivated by the above discussion, we consider the following LQ decomposition 


Rxjor—1 [Iy, 0 0 0 iT 
Rok—1 Lo, L272 0 0 pe 

= 11.32 
Wolr-1 L31 L32 Lg3 0 3 ‘ ) 
Wej2k—-1 Lay Laz Lag Lag | | QT 


where £11, Loo, L33, Lag € R*¢**4 are block lower triangular, and Q; are orthogo- 
nal. Then, from (11.28), the deterministic component can be given by 


“ A D3, L a 
Wiora = Bote amen} = [7 13] [ah 
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Thus from the above equation and (11.31), 
Wihoe—1 = Ly Qt + LaQd =%LiQt + OnX¢e (11.33) 
By using the orthogonality Q7 Qs = 0, we see from (11.33) that 
Lan = OnX$Q2 
In the following algorithm, it is assumed that the LQ decomposition of (11.32) is 


already given. 
Closed-loop Identification -ORT Method 
Step 1: Compute the SVD of Lg, i.e., 


ee Ci yy T Pees 
La =(U U] fa | ~Usvt (11.34) 


where S$ is obtained by neglecting sufficiently small singular values. Thus the dimen- 
sion of the state vector is the same as the dimension of .S, so that we have 


OnX#Q2 = Lag ~ USV = (us) (S207) 


Under the assumption that a @Qz has full rank, the extended observability matrix is 
given by 
Opes? 


Step 2: Compute the matrices A and C' by 
A=0!_,0,, C=0z(1:d,:) 


where ©); is obtained by deleting the first d rows from Ox. 
Step 3: Given the estimates of A and C’, compute the least-squares estimates of 
Band D from 
UTD, (B,D) =U" La Ly 
where £1, and £4, are obtained by (11.32), and U of (11.34) satisfies U7O, = 0, 
and with Dj; = 0, Dyz = 0. 
Step 4: Partition the obtained matrices B, C’, D as 


Ch 0 0 
B=(|B, B C= D= 
[Pr Bal, ak Ee Dal 


and compute the state space realizations of P(z) and C'(z) from (11.26) and (11.27), 
respectively. 

Step 5: Compute lower order models of P(z) and C'(z) by using a model reduc- 
tion method. This will be explained in the next section. 
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11.5 Model Reduction 


As mentioned already, all the identified transfer matrices have higher orders than the 
true one. To recover reduced order models from Lemma 11.1 (or Lemma 11.4), it is 
therefore necessary to delete nearly unreachable and/or unobservable modes. Since 
the open-loop plant is possibly unstable, we need the model reduction technique that 
can be applied to both stable and unstable transfer matrices [168, 186]. 

In this section, we employ a direct model reduction method introduced in Lemma 
3.7. The technique starts with a given balanced realization, but higher order models in 
question are not necessarily balanced nor minimal. Hence a desired model reduction 
procedure should have the following property. 


(a) Applicable to non-minimal and non-balanced realizations. 


(b) Numerically reliable. 


Let G(z) := (A, B, C, D) be a realization to be reduced, where we assume 
that A € R”*” is stable. Let P and Q be reachability and observability Gramians, 
respectively, satisfying 


P=APA'+BB', Q=A'QA+C'C (11.35) 


For the computation of Gramians for unstable A, see Lemma 3.9. 
A similarity transform of the state vector by a matrix Z yields 


An Ai|B 

Z—AZ|Z-1B rina ee 

ag A J= Ais; Abs aa (11.36) 
Ke C5 D | 


Define Z =[T T] and Z-1 = | . Then, we have 


LAT LAT 
LAT LAT 


“[AiAn). [EB] = TB = 
st ale Pere [CT CT] =[C, C3] 


so that we get Ay, = LAT, By = LBandC, = CT. 

The requirement (a) mentioned above is fulfilled by computing the matrices T' 
and L without actually forming the matrices Z and Z—!. Also, the requirement (b) 
is attained by using the SVD-based computation. The following algorithm satisfies 
these requirements. 


SR Algorithm 
Step 1: Obtain the Gramians P and Q by solving (11.35). 
Step 2: Compute the factorizations 


P=Ss'S, Q=R'R 


Note that chol in MATLAB® does not work unless P and Q are positive definite. 
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Step 3: Compute the SVD of SR™ € R”"*” as 


rE. 
SRT =USVT =[U) U9] ie zl Ly | (11.37) 


where 3) = diag(a, > +++ > Op > Op41 > +++ > On > 0), which are the Hankel 
singular values of the system. 


Step 4: Partition Y = diag{X, )}, where 
3, = diag(oi, +++, or), So = diag(or41, +++, On) 
and define the matrices T' and L as 
PeaSus' ‘Lhenievir (11.38) 
Then, a reduced order model is obtained by 
G,(z) = (LAT, LB, CT, D) (11.39) 
Using Lemma 3.7, we can prove that G,.(z) is a reduced order model. 


Lemma 11.5. A reduced order model is given by G,.(z) of (11.39). In general, G,.(z) 
is not balanced, but if we take the parameter r so that 3; > 0, X'y = 0, then G,.(z) 
is balanced and minimal. 


Proof. By the definition of Hankel singular values, 
VA(PQ) = VAi(STSRTR) = \/A;(RSTSRT) 
= 4/ o;(SRT)? = o;(SR*) 


This shows that the diagonal elements of »’ obtained in Step 3 of SR algorithm are the 
Hankel singular values. Pre-multiplying the first equation of (11.35) by S~!/?V7R 
and post-multiplying by RT'V 3—1/? yield 


SO RVTRPR VEO Sey RAPA Ree 
+ 5-V2yT R(BBT)RIVE 1? 
=I+h (11.40) 


From P = STS and (11.37), the left-hand side of (11.40) becomes 
Sey RS SRV SO ea Sey ys US Ve SS Ay 


To compute the right-hand side of (11.40), we note that 


T 
Ly R=: =| (11.42) 


S70 


y-lPytp = Bae 
Gx 
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By using the fact that UU = I,,, we get 
POU 


pee, Ba X10 
=S. [U1 U2] 0 Sel? 0 Xs, 
a 0. Lar 
0 a Us 
T 
= (2 | a | 7 | Fs 2) Sd Eiaae aa AY a (11.43) 


where Tis defined by (11.38) and T := STUs ee Hence, from (11.40), (11.42) 
and (11.43), 


h= =| A(T3\T? +TXoT")A'ET £7) 
ey fe TryT FT 
I= Z BB [L” L’| 
Thus from (11.41), the (1, 1)-block of (11.40) is given by 
5, = (LAT)5\(LAT)* + (LB)(LB)" + (LAT) 55(LAT)* 
= Ay, 5, A}, + BB? + Ay. 52 AL, (11.44) 
Similarly, from the second equation of (11.35), we have 
SSA A HOT Cis Al oa (11.45) 


Equations (11.44) and (11.45) derived above are the same as (3.41la) and (3.43), 
respectively. Thus G',(z) is a reduced order model of G(z), but is not balanced. 
Putting X’y = 0 in (11.44) and (11.45) gives 


3 = Ay 5 Ad, + By BY, 3 HALA +Ci CG 


implying that G,.(z) is balanced. 
The minimality of G,.(z) is proved similarly to Lemma 3.7. 


In the SR algorithm derived above, it is assumed that A is stable. For the case 
where A is unstable, defining the Gramians P and Q as in Definition 3.10, it is 
possible to compute them by the algorithm of Lemma 3.9. Hence, there needs to be 
no change in the SR algorithm except for Step 1. 


11.6 Numerical Results 


Some numerical results for closed-loop system identification are presented. The first 
model is a closed-loop system with a 2nd-order plant and a Ist-order controller, for 
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Figure 11.3. A feedback control system 


which results obtained by the ORT and CCA methods are compared. In the second 
example, we present identification results for a 5th-order plant with a 4th-order con- 
troller by means of the ORT method. A feedback control system used in the present 
simulation is displayed in Figure 11.3. 


11.6.1 Example 1 
Suppose that the transfer functions of plant and controller are given by 


zt z—-0.8 


2 
2) = Tier ts 08or” z 


where the closed-loop poles are located at z = 0, 0.3, 0.3. We assume that the noise 
v is an ARMA process generated by 


Oe 1 — 1.56271 + 1.045z-? — 0.333827% 
~ 1 —2.352-! + 2.09z-2 — 0.667523 


This model is a slightly modified version of the one used in [160], in which only 
the probing input rz is used to identify the plant, but here we include r; and rz as 
reference inputs in order to identify both the plant and controller. The reference input 
r, is a composite sinusoid of the form 


30 
ri(t) =p). Ajsin(w;t + $;), t=0,1,---,N+2k—-2 


j=l 


where a magnitude p is adjusted so that 07 = 1, and A; is a white noise with N(0, 1). 
The parameters w; and ¢; are uniformly distributed over (0, 7), so that 7; has PE 
condition of order 60. The rz and y are Gaussian white noises with variances 03 = 
(0.2)? and a? = 1/9, respectively. 

For the ORT method, since the sum of the orders of plant and controller is three, 
3rd-order state-space models are fitted to the input-output data (r, w). Then, the 3rd- 
order plant and controller models so identified are reduced to the second- and the 
first-order models, respectively. 

On the other hand, for the CCA method, 6th-order models are fitted to (r, w), 
because the sum of orders of the plant, controller and noise model is six, and because 
the state space model cannot be divided into separate deterministic and stochastic 
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Figure 11.4. Estimates of poles, (+): plant, (x): controller 
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Figure 11.5. Bode plots of P(z) 


components. Thus, in this case, the identified 6th-order models of plant and controller 
are reduced to the second- and the first-order models, respectively. 

Case 1: We take the number of data points N = 2000 and the number of block 
rows k = 15, and generated 30 data sets, each with different samples for r,, rz and v. 
Figures 11.4(a) and 11.4(b) respectively display the poles of the plant and controller 
identified by the ORT and CCA methods, where + and x denote the true poles of 
plant and controller, respectively. Figures 11.5(a) and 11.5(b) respectively display 
the Bode plots of the estimated plant, and Figures 11.6(a) and 11.6(b) the Bode plots 
of the estimated controller. We see from these figures that the identification results 
by the ORT method are quite good, but the results by the CCA method are somewhat 
degraded compared with the results by the ORT method. 

The Bode plots of the plant identified by the ORT and CCA methods based on 
the direct approach are shown in Figures 11.7(a) and 11.7(b), respectively. We clearly 
see biases in the estimates of Bode magnitude, where the ORT provides somewhat 
larger biases than the CCA method. 
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Figure 11.6. Bode plots of C(z) 
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Figure 11.7. Identification results by direct method 


Case 2: As the second experiment, we consider the effect of the number of 
data on the performance of identification. For the plant parameter vector 9 := 
(—1.6 0.89 1 0) € R*, the performance is measured by the norm of the estimation 
error 


1 M 
IN=— — 6(i, N)|\? 
N if 2 \|@ — 8, N)|| 


where Ai, N) € R* denotes the estimate of 6 at ith run, and where the number of 
data is N = 200, 500, 1000, 2000, 5000, and the number of runs is M@ = 30 in each 
case. Figure 11.8 compares the performance of the identification of plant transfer 
function by the ORT and CCA methods. This figure clearly shows the advantage of 
ORT-based algorithm over the CCA-based algorithm. 

As mentioned before, if the exogenous inputs 71, r2 are white noise, then the 
two algorithm present quite similar identification results. However, if at least one 
of the exogenous inputs is colored, then we can safely say that the ORT method 
outperforms the CCA method in the performance. 
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Performance Index 


10° 


10° 
Number of Data N 
Figure 11.8. Comparison of identification results: the ORT (0 - - - 0) and CCA (x - - - x) 
methods 


11.6.2 Example 2 


We assume that the plant is a discrete-time model of laboratory plant setup of two 
circular plates rotated by an electrical servo-motor with flexible shafts [169], where 
the transfer function of the plant P(z) is given by 


Pi 10-° (0.9824 + 12.990z? + 18.5892? + 3.29872 — 0.02) 
“7 = 8 — 4.398624 + 8.085223 — 7.82332? + 3.9954z — 0.8588 


and where a stabilizing controller is chosen as 


Core 0.630024 — 2.08302? + 2.82222? — 1.86502 + 0.4978 
rs z4 — 2.650023 + 3.11002? — 1.7500z + 0.3900 


The configuration of the plant and controller is the same as the one depicted in Figure 
11.3, where the output noise process v = v is a Gaussian white noise sequence with 
E {v?(t)} = 1/9. Both the reference signals r; and rz are Gaussian white noises 
with variances 07 = 1 and a3 = 0.5, respectively. Note that P(z) has poles at z = 1, 
0.9674 + 0.14937, 0.7319 + 0.60057; thus the plant has one integrator and therefore 
is marginally stable. Also, the controller C(z) has the poles at z = 0.7169+0.6678), 
0.6081 + 0.19107; thus the controller is stable. We take the number of data points 
N = 4000 and the number of block rows k = 15. We generated 30 data sets, each 
with the same reference inputs r; and rz, but with a different noise sequence v. 

In this experiment, we have employed the ORT method. Figure 11.9 shows the 
estimated eigenvalues of the matrix Az — B22 D5 C2 [see Lemma 11.4], where + 
denotes the true poles of the plant, x those of the controller and * the estimated 
eigenvalues. From Figure 11.9 we can see that the nine poles of plant and controller 
are identified very well. 

The estimated transfer function P(z) of 5th-order is displayed in Figure 11.10. 
Figure 11.10(a) shows the estimated poles of the plant, where + denotes the true 
poles and + denotes the estimated poles over 30 experiments. The Bode plot of the 
estimated transfer function of the plant is depicted in Figure 11.10(b), where the 
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Figure 11.9. ORT method : poles of A — Bz D3,' C2 [(11.26)]; +: plant, x: controller 
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Figure 11.10. Identification by ORT method 


dashed line depicts the true transfer function of the plant and the solid line depicts the 
average over 30 experiments. From these figures, we can see that the ORT method- 
based algorithm performs very well in the identification of the plant. Since C(z) is 
stable, there are no unstable pole-zero cancellations in the reduction of the estimated 
plant; thus it seems that the model reduction is performed nicely. 

Furthermore, the estimation results of the controller are depicted in Figures 11.11 
and 11.12. As in the case of the plant estimation, the estimation of the controller 
needs the model reduction by approximate pole-zero cancellations. It should be noted 
that in order to estimate the controller having the same order as the true one, we need 
to perform an unstable pole-zero cancellation at z = 1. Figure 11.11 depicts the 
estimated controller as a 4th-order model, which is the same order as the true one, 
where Figure 11.11(a) shows the pole estimation, where x denotes the true poles and 
* denotes the estimated ones, and Figure 11.11(b) shows the Bode plots of the true 
transfer function (dashed line) and the average transfer function over 30 experiments 
(solid line). We can see from these figures that there are many incorrect poles around 
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Figure 11.12. Identification of controller (Sth-order model) 


positive real axis including z = 1, and the Bode plot is biased in low frequency 
range. 

On the other hand, Figure 11.12 displays the estimated controller as a 5th-order 
model, i.e., the estimated 9th-order models are reduced to 5th order. In this case, 
though Figure 11.12(a) shows that there are many incorrect poles around real axis, 
we can see from Figure 11.12(b) that the Bode gain of controller is estimated very 
well by using a 5th-order transfer function. 


11.7 Notes and References 


e In this chapter, based on Katayama et al. [87,88], we have developed two closed- 
loop subspace identification methods based on the ORT and CCA methods de- 
rived in Chapters 9 and 10, in the framework of the joint input-output approach. 
See also Katayama ef al. [89], in which the role of input signal in closed-loop 
identification is discussed in detail. 
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e The importance and the basic approaches of closed-loop identification are re- 
viewed in Section 11.1 [48, 109, 145]. In Section 11.2, the problem is formulated 
and the fundamental idea of the joint input-output approach is explained. The 
present problem is virtually the same as the one treated in [170]. 


e Section 11.3 is devoted to the realization of feedback system and the derivation of 
subspace identification method based on the CCA method. Section 11.4 derives 
a subspace identification method based on the ORT method, and shows that the 
plant and controller can be identified by a realization of deterministic component 
of the joint input-output process. 


e Since the transfer matrices derived by the joint input-output approach are nec- 
essarily of higher order than the true one, we have presented a model reduction 
technique called the SR algorithm in Section 11.5. 


e Section 11.6 shows the procedure of closed-loop identification methods through 
some numerical results. The performance of closed-loop identification depends 
on the basic open-loop identification techniques; numerical results show that per- 
formance of the ORT based method is somewhat superior to that of CCA based 
method. Some related numerical results are also found in [89]. 


e Under the assumption that the plant is stable, a simple closed-loop identification 
method based on the orthogonal decomposition of the joint input-output process 
is described in Appendix below. 


11.8 Appendix: Identification of Stable Transfer Matrices 


In this section, as Appendix to this chapter, we present a simple closed-loop identifi- 
cation procedure by using the result of Lemma 11.3 under the assumption that all the 
open-loop transfer matrices in Figure 11.2 are stable. In the following, Assumptions 
1 and 2 stated in Section 11.2 are satisfied. 


11.8.1 Identification of Deterministic Parts 
From (11.21), we have two deterministic equations 
ya(t) = P(z)ua(t) (11.46) 
and 
ta(t) = —C(z)Ga(t) (11.47) 
where 
galt) = ya(t)—ri(t), Galt) = ua(t) — ra(t) 


It should be noted that above relations are satisfied by deterministic components 
(ua, ya) and (tia, Ja), since the noise components are removed in these relations. 
Thus Figure 11.13 displays two independent open-loop systems for the plant and 
controller, so that we can use (11.46) and (11.47) to identify the open-loop plant 


11.8 Appendix: Identification of Stable Transfer Matrices 325 


Ua Ya Ya —Uad 


P(z) C(z) 


Figure 11.13. Plant and controller in terms of deterministic components 


P(z) and the controller C'(z) independently. The present idea is somewhat related to 
the two-stage method [160] and the projection method [49]. 


Identification Algorithm of Plant and Controller 


Step 1: By using LQ decomposition, we compute the deterministic components 
of the joint input-output process (ya, wa) and then compute (a, ta). 


Step 2: We apply the ORT (or CCA) method to the input-output data (ua, ya) to 
obtain 
p(t +1) = Apax,(t) + Bpua(t) (11.48a) 
ya(t) = Cy2x,(t) (11.48b) 
Then the plant transfer matrix is given by P(z) = (Ap, By, Cp). 
Step 3: We apply the ORT (or CCA) method to the input-output data (tia, ja) to 


obtain 
a-(t+1) = Aca.(t) + Bega(t) (11.49a) 
—tia(t) = C.xa(t) + DeGa(t) (11.49b) 
Then the controller transfer matrix is given by C(z) = (A., B., Ce, De). 


For numerical results based on the above technique, see Katayama et al. [92]. 


11.8.2 Identification of Noise Models 


We have not discussed the identification of noise models in this chapter. But, they can 
easily be identified, if both the plant and controller are open-loop stable. It should be 
noted that the noise models are located outside the closed-loop, so that the identifi- 
cation of noise models is actually an open-loop identification problem. 

Under the assumption that P(z) and C'(z) are stable, we compute 


Ys(t) = ys(t) — P(z)u.(t) = H(z)v(t) (11.50) 
and 
tis(t) -= us(t) + C(z)ys(t) = F(z)n(t) (11.51) 


Figure 11.14 shows the block diagrams for noise models. 

Since (is, Js) are second-order jointly stationary processes, we can identify 
noise models H(z) and F(z) by applying the CCA method (or a stochastic subspace 
identification technique). 
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Figure 11.14. Noise models in terms of stochastic components 


In the following, we assume that the plant and controller in Figure 11.13 are 
already identified by the procedure stated above. 


Identification Algorithm of Noise Models 
Step 1: By using (11.50) and (11.51), we compute y, and w,. 


Step 2: Applying the CCA method developed in Chapter 8 to the data ¥,, we 
identify 


tnt + 1) = Anan(t) + Knen(t) (11.52a) 
9s(t) = Crtn(t) + en(t) (11.52b) 


Then, the plant noise model is given by H(z) = (An, Cr, Kn, Ip). 
Step 3:_ Applying the CCA method developed in Chapter 8 to the data u,, we 
identify 
x(t +1) = Apay(t) + Kees (t) (11.53a) 
ts(t) = Cray (t) + es (4) (11.53b) 


Then, the controller noise model is given by F(z) = (Ar, Cy, Ky, In). 


Appendix 


A 


Least-Squares Method 


We briefly review the least-squares method for a linear regression model, together 
with its relation to the LQ decomposition. 


A.1 Linear Regressions 


Suppose that there exists a linear relation between the output variable y(¢) and the 
d-dimensional regression vector p(t) = [yi(t) yo(t) «++ va(t)]?. We assume that 
N observations {y(t), y(t), t = 0, 1, --- , N — 1} are given. Then, it follows that 


where e(t) denotes the measurement noise, or the variation in y(t) that cannot be 
explained by means of y(t), --- , ya(t). We also assume that yi (t), --- , ya(t) 
have no uncertainties!. 

For simplicity, we define the stacked vectors 


an y(0) e(0) 
2 y(1) e(1) 
G= : y= . ; e= 
Oa y(N —1) e(N — 1) 
and the matrix 
1 (0) 2 (0) ya(0) y" (0) 
a yi (1) y2(1) ya(1) 7 y" (1) 
e(N 1 e(N-1)-- gaN-I] LeT(W-1) 


where & € R“ 4. Then (A.1) can be written as 


‘If ~ are also subject to noises, (A.1) is called an errors-in-variables model. 
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y= 0+e 


This is referred to as a linear regression model. The regression analysis involves the 
estimation of unknown parameters and the analysis of residuals. 
The basic assumptions needed for the least-squares method are listed below. 
AJ) The error vector e is uncorrelated with @ and 0. 
A2) The error vector is a random vector with mean zero. 
A3) The covariance matrix of the error vector is 0? Iy with 0? > 0. 
A4) The column vectors of & are linearly independent, i.e., rank(®) = d. 


Under the above assumptions, we consider the least-squares problem minimizing 
the quadratic performance index 


N-1 


J(6) = So [y(t) — eA? = lly — 84|/? 


t=0 


Setting the gradient of .J(@) with respect to 6 to zero yields 


N-1 —1 
(= ane") 6= 5  lty(t) + (676)0=Gy (A2) 


t=0 


This is a well-known normal equation. 
From Assumption A4), we see that 6'@ € R¢*¢ is nonsingular. Thus, solving 
(A.2), the least-squares estimate is given by 


N-1 


N-1 
6s = (= er) g(t)y(t) =(@TS)"b"y — (A.3) 


t=0 


+ 
Il 
j=) 


Also, from Assumptions Al) and A2), 
E{6.s} = E{(@' 6)!" (66 + e)} 
=6+(6'6)'6 Efe} =6 (A.4) 


so that the least-squares estimate 6s is unbiased. It follows from Assumption A3) 
that the error covariance matrix of the estimate 6,5 is 


cov{O.s} = E{[9 — 6:s][9 — A:s]1} = 0? (@Te)-! 
Moreover, define the residual vector as ¢ := y — £615. Then, it follows that 
é = [In — &(6'6)' 8" ly = [Iy — (61 6)" G" Je (A.5) 


It should be noted that JT := (6™6)—'@" satisfies 7? = I and IT = II™, so 
that IT is an orthogonal projection onto Im(@). Also, Q := Iy — IZ is an orthogonal 
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projection onto the orthogonal complement (Im #)+ = Ker(¢"). Then, |le||?_ = 
y' (In — I1)y denotes the square of the minimum distance between the point y and 
the space Im("). 

We compute the variance of the residual. From (A.5), 


E{|le||?} = traceE{ee*} 
= trace ([Iy — 6(61 6)! 6" |Efee'}[Iy — (6'S)~'6"}) 
=o" trace[Iy — 6(6'S)—' 6") 
=o" [trace(Iy) — trace(6(6'S)-'G")] = o?(N — d) 


Hence, the unbiased estimate of the variance a? is given by 


v= — ) e*(t) llell? 


S Wat 


In practice, the above assumptions Al) ~ A4) are not completely satisfied. If 
either Al) or A2) is not satisfied, then a bias arises in the least-squares estimate. In 
fact, in the computation of (A.4), we have 


Ef{6is} =0+ E{(@'6)-'67 ec} £0 


Suppose that Efee"} = R > 0, so that A3) does not hold. In this case, we 
consider a weighted least-squares problem of minimizing 


J(8) = |ly — P6l|R-1 = (y — 90)" Ro" (y — $8) 


By using the same technique of deriving the least-squares estimate 6s, we can show 
that the optimal estimate is given by 


bats := (©'R'6)16TR ly 


where Oars is called the generalized least-squares estimate. The corresponding error 
covariance matrix becomes 


cov{9ats — (gt R! ®) = 


We now turn to Assumption A4). In real problems, we often encounter the case 
where there exist some “approximate” linear relations among regression vectors (col- 
umn vectors of @); this is called a multicolinearity problem in econometrics. In this 
case, one or more eigenvalues of &' get closer to zero, so that the condition num- 
ber «(@) becomes very large, leading to unreliable least-squares estimates. An SVD- 
based method of solving a least-squares problem under ill-conditioning is introduced 
in Section 2.7. There are also other methods to solve ill-conditioned least-squares 
problems, including regularization methods, the ridge regression, efc. 
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Example A.1. Consider the normal equation of (A.2): 
(6'6)6 = Gy, GE RNX4 y € RX (A.6) 


We show that (A.6) has always a solution for any y € R. It is well known that 
(A.6) is solvable if and only if the vector Ty belongs to Im($'). However, this is 
easily verified by noting that Ty € Im(#") = Im(@"4). 

By direct manipulation, we can show that 9 = $y is a solution of the normal 
equation, where $" is the pseudo-inverse defined in Lemma 2.10. Indeed, we have 


(G1 6)Gly = G Ooty = (G1) Sty = ST (GS) tSly = Sty 


where the Moore-Penrose condition (iii) is used (see Problem 2.9). Also, the general 
solution 
6=Gy+(Iy—G'O)z, VzeER? 


satisfies the normal equation. 
Let © be the set of minimizers 
@ := {4 | |ly — £6] = min} 


Then, we can show that 
1. If @ is a minimizer, i.e. 9 € ©, then 1" (y — 60) = 0, and vice versa. 
2. Ifrank(#) = d, then © = {6,5}, a singleton. 
3. The set © is convex. 


4. The set © has a unique minimum norm solution 9 = &ty. 


We apply the regression analysis technique to an ARX model’, leading to a least- 
squares identification method, which is one of the simplest methods for a realistic 
identification problem. 


Example A.2. Consider an ARX model 
A(z)y(t) = B(z)u(t) + e(t) (A.7) 


where the unknown parameters are 9 := (a; ++: Gn bi ++: bm)* and the 
noise variance o?. This is also called an equation error model, which is most easily 
identified by using the least-squares method. It should be noted that the ARX model 
of (A.7) is derived from (1.1) by setting H(z) = 1/A(z). 

From (A.7), the prediction error is given by 


e(t,0) := A(z, 0)y(t) — B(z, @)u(t) = y(t) — »* (8 (A.8) 
where y(t) is the regression vector defined by 


> ARX = AutoRegressive with eXogenous input. 
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p(t) = [-y(t—1) «-» —y(¢—n) ult 1) «wt —m)[" € R™ 
Also, the unknown parameter vector is given by 
0 := (a1 +++ Gn by +++ bm)? 
Thus, it follows from (1.3) and (A.8) that 


5 Solu [y(t) — ep (HOP? 


This implies that the PEM as applied to ARX models reduces to the least-squares 
method, so that the optimal estimate is given by 


N-1 x: , 82 
Aus(N = (% vtt ser) wD. elu) (A9) 


t=0 
Suppose that the actual observations are expressed as 
y(t) = p" (t)0o + vo(t) (A.10) 


where vo is a noise, and 9p is the “true” parameter. Substituting the above equation 
into (A.9) yields 


: , Xa 1, No 
Ois(N) = 8 + (i g(t" 0) wv p(t)vo(t) 
t=0 t=0 
Suppose that 
N= 
LS1) Nim, — ar = E{y(t)y" (t)} = nonsingular 
t=0 
N- 
LS2) Jim = + oe = E{y(t)vo(t)} =0 
=0 


hold?. Then we can show that 


Thus the least-squares estimate is consistent. 


For convergence results based on laws of large numbers, see [109, 145]. If the 
above condition LS2) is not satisfied, then the least-squares estimate becomes biased. 
In order to obtain an unbiased estimate, we can employ a vector sequence correlated 
with the regressor vector y(t) but uncorrelated with the external noise vo(t). 


>The second condition is surely satisfied if vp is a filtered white noise and g(t) is a 
bounded sequence [109]. 
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Example A.3. (IV estimate) Let ¢(t) € R®@ be a vector sequence. Pre-multiplying 
(A.10) by ¢(t) and summing over [0, N — 1] yield 


N-1 
7D Sul -(F3 y clo (0) m4 FY Owl volt 
Suppose that a vector ¢(t) satisfies two conditions 
1 N-1 
Iv1) Jim > 2 C(t)y" (t) = E{C(t)y" (t)} = nonsingular 
1 we 
Iv2) lim + 2 C(t)vo(t) = E{C(t)vo(t) } = 0 


Then, we obtain a consistent estimate 


“lo N-1 
Ay (N = (¥3 y coe "«) ~ > Sule) (A.11) 


This estimate is usually called an instrumental variable (IV) estimate, and the vectors 
C(t) satisfying the conditions IV1) and IV2) are called IV vectors. 


Detailed discussions on the IV estimate, including the best choice of the IV vector 
and convergence results, are found in [109, 145]. 


A.2 LQ Decomposition 


We consider the relation between the least-squares method and LQ decomposition, 
which is a key technique in subspace identification methods. 
Consider an FIR (finite impulse response) model 


k-1 


y(t) = 2 giu(t — 4) + e(t) (A.12) 


i=0 


where e is a white noise with mean zero and variance o?. The problem is to iden- 
tify the impulse responses 6 := (gx—1 «*: gi go)' based on the input-output data 


{u(t), y(t), #=0, 1,---, N+ — 2}. We define a data matrix 
u(0) u(1)--. u(N—-1) 
ul) u(2)--- uN) 
Uo|k—1 | = ; c RETDXN 
Yp—ie—1 ; 
u( 
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where we assume that Uo),—1 has full row rank, so that rank(Uo|x—1) =k. 
We temporarily assume that 7? = 0. Then from (A.12), we get 


u(O0) u(1)--- u(N—-1) 
u(1) u(2)--- u(N) 
[9e-1 gk-2 °°: go — 1 : : =0 
u(k — 1) u(k) --- u(N +k - 2) 
y(k— 1) y(k) ++ YN +k —2) 
or this can be simply written as 
pry | Uo|n—1 | = (A.13) 
Yp—1)k—-1 
As shown in Example 6.2, this problem can be solved by using the SVD of the data 
matrix. In fact, let Uo|k—a = USV". Since the last singular value is zero due 
k—1|k—-1 
to (A.13), i.e. on41 = 0, the (k + 1)th left singular vector u,41 satisfies 
Uolr— 
T Olk-1 | _ 
ee Yp—1k-1 | = 


Thus, normalizing the vector wz41 so that the last element becomes —1, we obtain 
an estimate of the vector 0. 

Now we assume that o~ > 0, where no @ exists satisfying (A.13), so that we 
must take a different route to estimate the vector 9. The LQ decomposition of the 


data matrix yields 
Uoln—1 In 0] [Qi 
= A.14 
Fad Be AA Bi ns 


where Ly, € R’**®, Loo € R'*!, Lo, € R'**, and matrices Q, € RN**®, Qo € 
IR‘ *? are orthogonal. By the rank condition for Uo|n—1, we see that det(Li1) # 0, 
so that 


2 


Yp—aje—-1 = IQ] + Lx2Q> = Loi LT Voy + L22Q3 


Since Q} Q»2 = 0, two terms in the right-hand side of the above equation are uncor- 
related. Define 


(Gk—1 “t* Gk-1 go) = Dy Li 


and 
[e(k 1) e(k) «++ e(W +h —2)] = Ln0QF 


Then, fort = k-—1,k,---, N+k —2, we have 
y(t) = gou(t) + giu(t —1) +--+ + gp_-iu(t —k +1) + eft) 


This is the same FIR model as (A.12), implying that 9 = L2,L7;' € R'** is the 
least-squares estimates of impulse response parameters. 
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We show that the above result is also derived by solving the normal equation. 
The identification problem for the FIR model (A.12) can be cast into a least-squares 


problem 
min J(@) = ena - Uoin—14l” 


Thus, from (A.3) and (A.14), the least-squares estimate is given by 
b= (Uojn—1Ugjn—1)Uoje—1 Ve axa 
= (Lu Lt) *[L1i Qt (Lar Qt + L22Q3)*] = (Lai Ly)" 


This is exactly the same as the least-squares estimate of 9 obtained above by using the 
LQ decomposition. Thus we conclude that the least-squares problem can be solved 
by using the LQ decomposition. 


B 


Input Signals for System Identification 


The selection of input signals has crucial effects on identification results. In this 
section, several input signals used for system identification are described, including 
step signals, sinusoids as well as random signals. One of the most important concepts 
related to input signals is the persistently exciting (PE) condition. 

Let u(t),¢ = 0, 1, --- bea deterministic function. Then we define the mean and 
auto-covariance function as 


, 82 
jy = lim ye u(t) (B.1) 
and for! = 0, +1 
, 82 
Auu(l) = Nim NW Qu [u(t +1) — pul[u(t) — pu] (B.2) 


Example B.1. (a) A step function is defined by 


P20. ips 
w= es ca eee 


In this case, we have A,,,,(1) = 0 for] = 0,+1,---. 
(b) Consider a sinusoid defined by 


u(t) = asin(wt + ), t=0, 1,--: (B.3) 


where w > 0 denotes the angular frequency, and a > 0 and0O < @ < 7 are the 
amplitude and phase, respectively. Let 


N-1 N-1 


S- sin(wt + ), Cn = S- cos(wt + ¢) 


t=0 t=0 
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Since lim Sj =Oand lim Cy = 0 hold, we have 
N- co N- co 


N-1 
Ayu(l) = lim_ <> S> a? sin(o(t +1) + 4) sin(ut + 8) 
t=0 


a2 
zy cos(wl), 1=0,+1,--:- 


where the formula: sin a sin 3 = [cos(a — 8) — cos(a + )]/2 is used. 
Also, consider a composite sinusoid 


Pp 
u(t) = 5° ajsinwjt+¢;),  t=0,1,--- (B.4) 


j=l 


where 0 < w) < +++ < wp, denote the angular frequencies, and {a, } and {¢, } denote 
the amplitudes and phases, respectively. Then, it can be shown that 


Pp 
Auu(l) = > = cos(wjl), Oia 


(c) In system identification, a pseudo-random binary signal (PRBS) shown in 
Figure B.1 is often employed as test inputs. The PRBS is a periodic sequence with 
the maximum period NV = 2? — 1 where p is an integer greater than three, and is 
easily generated by p-stage shift registers. It is shown [145] that the mean and auto- 
covariance of a PRBS taking values on +0 are given by 


1 N 
tu = uO =H (B.5) 


rl : = 1=0 (mod N) 
Auu(l) = (B.6) 


-—(1+=), 140 (mod N) 


The auto-covariance function are shown in Figure B.2. 


—b 
N 


Figure B.1. A PRBS with the maximum period N = 15 
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Figure B.2. The auto-covariance sequence of PRBS 


In order to explain the PE condition of input signals, we consider the same FIR 
model as (A.12): 


k-1 
y(t) = S~ giu(t —) + e(t) (B.7) 
1=0 


where e is a zero mean white noise with variance 0”. We deal with the identification 
of the impulse responses 9 = (gz—1 «+: 91 go)! of the FIR model based on input- 


output data {u(t), y(t), t = 0, 1, --- , N —1}. For notational simplicity, we define 
the stacked vectors 
y(k — 1) e(k — 1) 
YN-1 = ae , e€n-1 = oe € RINK) xT 
yi =) e(N=1) 
and the matrix 
u(0) u(1) ++» u(k —1) 


1 >) ier k 
Un_i = wu ) us ) us ) € RN -A+1) xk 


ih = Batts Beaute 4) 
Then, from (B.7), we have a linear regression model of the form 
yn—-1 = Un-10 + en-1 (B.8) 
The least-squares estimate of 9 for (B.8) is obtained by solving 
min ||yw—1 — Un-16|| 
Recall that Conditions Al) ~ A4) in Section A.1 are required for solving the 


least-squares estimation problems. In particular, to get a unique solution, it is neces- 
sary to assume that rank(Uy_,) = k. This condition is equivalent to the fact that 


u(0) u(1)--. u(N—k) 
u(1) u(2) --- u(N-—k+1) 


rank =k (B.9) 


esta: WS) 
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It may be noted that the data length is finite for this case. 


Definition B.1. [165] A deterministic sequence u with length N is PE of order k if 
(B.9) holds. If the input is a vector process u € IR”, then the rank condition (B.9) is 
replaced by rank(Un_1) = km. 


For a zero mean stationary process u € IR”, we define the covariance matrix by 


Auu(k) = lim —UNUN 
AO, a) Ajulk — 1) 


(B.10) 


Auulk —1) Auu(k — 2) +++ Auu(0) 
Then the PE condition for a stationary stochastic process is defined as follows. 


Definition B.2. (109, 145] If Auu(k) of (B.10) is positive definite, then we say that 
u has the PE condition of order k. 


The following example shows that when we deal with finite data, there always 
exist some ambiguities regarding how we treat boundary data. 


Example B.2. Consider the step function treated in Example B.1. It can be shown 
that the step function is not PE since we have rank (Uy_,) = 1. 

However, in practice, step signals are often used for system identification. To 
consider this problem, we express (B.7) as 


y(0) = gou(O) + giu(—1) +--+ + ge-1u(—-k + 1) + e(0) 
y(1) = gou(1) + giu(0) + +++ + gx—1u(—k + 2) + e(1) 


y(k — 1) = gou(k — 1) + giu(k — 2) +--+ + ge-1u(0) + e(k — 1) 
y(k) = gou(k) + giu(k — 1) +--+ + gx—1u(1) + e(k) 


Suppose that the system is at rest for t < 0. Then we have u(t) = 0,t = 
—1, —2,---, —k +1. Rearranging the above equations and assuming that e(t) = 0 
fort = 0,1,--+, we get 


[y(0) y(1) «+» y(N — I) 
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Thus if u(t) = 1, ¢ = 0, 1,---, the wide rectangular matrix in the right-hand side 
of the above equation has rank k. Hence, by using the least-squares method, we can 
identify the impulse responses go, 91, °*: ; Gk—1- However, in this case, it should 
be understood that the estimate is obtained by using the additional information that 
WN=6, tS 08, =e, 


Example B.3. We consider the order of PE condition for simple signals based on 
Definition B.2. 


(a) Let u(t) be a zero mean white noise with variance o?. Then, for all k > 0, 
we see that Ay.(k) = o7 Ty is positive definite. Thus the white noise satisfies the PE 
condition of order infinity. 


(b) Consider a sinusoid u(t) 
covariance function is given by 


= Asin(Aot), 0 < Ag < z. Then, the auto- 
Auulk) = (422) cos(Aok), so that 

a cos Ao 
2 


cos oe 1 


ee cos Ap Cos 2X9 ] 
COs Ao 1 cosro 
eos 2X Cos Ao il | 


ois 


We see that rank[A,,..(2)] = 2, and rank[A,,.,(k)] = 2 fork = 3, 4, ---. Hence the 
sinusoid has PE condition of order two. This is obvious because a sinusoid has two 
independent parameters, a magnitude and a phase shift. 


Lemma B.1. The PE conditions for some familiar stochastic processes are provided. 


(i) ARMA processes have the PE condition of order infinity. 
(ii) The composite sinusoid of (B.4) satisfies the PE condition of order 2p. 


Proof. [145] (i) Let u be a zero mean ARMA process with the spectral density 
function ,,,,(w). Define h = (h(0), h(1), --- , h(1—1))*, and 


Consider a process defined by y = H(z)u. Then we easily see that y is a zero mean 
second-order stationary process, so that the variance of y is given by 


sa {(z 


It follows from Lemma 4.4 that 


u(t — 7) 


I-1 
| = So Auli = {H(A = hT Aya (Dh 


7,j=0 


fh Ags —— i |H (e”)|? Bay (w)dw (B.11) 
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Suppose that u does not satisfy the PE condition of order /. Then there exists a 
nonzero vector h € R’ such that h? A,,,(1)h = 0. Since the integrand of (B.11) is 
nonnegative, we have |H(e%”)|?®,,,,(w) = 0 for all w!. However, from (4.35), the 
spectral density function of the ARMA process is positive except for at most finite 
points. It therefore follows that H(e/”) = 0 (ae.), and hence h = 0. This is a 
contradiction, implying that the ARMA process satisfies the PE condition of order /. 
Since / is arbitrary, the ARMA process satisfies the PE condition of infinite order. 
(ii) Since, as shown in Example B.3, a sinusoid has the PE condition of order 
two, the composite sinusoid of (B.4) has the PE condition of order 2p. 


From Lemma B.1 (i), we can say that for a stationary process wu, if 
Puu(w) > 0, —T<Kw<7 


is satisfied, then w is PE of order infinity. This condition has already been mentioned 
in Chapters 9 and 10. 


'The equality holds for w € (—7, 7) almost everywhere (a.e.). 


C 


Overlapping Parametrization 


In this section, we derive an overlapping parametrization for a stationary process; see 
also Example 1.2. From Theorems 4.3 and 4.4 (see Section 4.5), a zero mean regular 
full rank process y € IR’ can uniquely be expressed as 


y(t) =~ Hy e(t - 4) as Hy_; ei (C.1) 
i=-0 i=—0o 


where e is the innovation process with mean 0 and covariance matrix R > 0, and 
where H;, 2 = 0, 1, +--+ are impulse response matrices satisfying 


DI? <0; Ho = 


Define the transfer matrix by 


Moreover, define 
Oe = span{y(t P| ibe y(t _ 2), a aa 
€; =Span{e(t — 1), e(t— 2), ---} 
Then, it follows that Yy = €,,t = 0, +1,---. In the following, we assume that 


both H(z) and H~!(z) are stable. 
Let t be the present time. Then, from (C.1), 


t+k 
y(t +k) = fie i e(i 3 Ha gsieh),. k= 01 (C.2) 


i=—0o 


Thus we see that the first term in the right-hand side of the above equation is a 
linear combination of the future innovations e(t), --- , e(t + k) and that the second 
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term is a linear combination of the past innovations e(t — 1), e(t — 2), ---. Since 
Y, = €;, the second term is also expressed as a linear combination of the past 
outputs y(t — 1), y(t — 2), ---, and hence it belongs to Y; . Thus it follows that the 
optimal predictor for y(¢ + k) based on the past Y; is given by (see Example 4.10) 


t—1 
Gtt+k\t-1)= D> Aysie(t),  k=0,1,-- (C.3) 


1=— CO 


Repeated use of this relation yields 


g(t |t—1) Ay Ho H3--- e(t — 1) 
g¢@+1|t-1) Ay Hz Hyg-:- e(t — 2) 
y 


(t+2|t-1)| =| Ha Hs Hs. e(t — 3) (C.4) 


It should be noted that this is a free response of the system with the initial state 
resulting from the past inputs e up to time ¢ — 1 (see also Section 6.2). 
Let the block Hankel operator be 


A, He H3 +: 
Aly Hz Hy -:: 
H= | A, Hy Hs--- 


where it is assumed that rank(H) = n < oo. As shown in Section 8.3, the predictor 
space is defined by 


ai" = BAYS | Ye} = span(g(t + k | t—1) | k=0,1,---} 
Thus, we can find n independent vectors from the infinite components 
{gi(t+k|t—-1), t=1,---,p, k=0,1,---} (C.5) 


where the n independent vectors form a basis of the predictor space xP ee 
Suppose that #(t | t — 1) has full rank, i.e. cov{g(t | t — 1)} > 0. Then, 


Ht | t= Dy god | b= De a b= 1) (C.6) 


are linearly independent, and hence we see that the first p rows of H are linearly 
independent. 

Let 7 = (ni, ++ , np) bea set of p positive integers such that ny +---+n, =n. 
We pick n elements including the p components of (C.6) from the infinite compo- 
nents defined by (C.5). Let such vectors be given by 
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melts Lear |e baat ee = Le) 
got |t—1),9 o(t+1|t—1),- -, Go(ttng—-1|t—-1) 
Gp(t |t—1), g Le eayeeaye +, Gp(t+n, -—1|t-1) 


Note that, for example, ifn, = 1, then only 4 (¢ | t—1) is selected from the first row. 
If these n vectors are linearly independent, we call 7 = (m1, +++ , Np) a multi-index; 
see [54, 68, 109] for more details. 
By using the above linearly independent components, we define a state vector of 
the system by 
pitt |t—1) 


wi(t+ny “4 | t—1) 
a(t) := ; ER" (C.7) 
p(t | t— 1) 


p(t +n, —1] t-1) 
From (C.3), we get 


y(t +k | t) 


3S Ars pe ,e(4 as Ay p- ged |) + Hy, e(t) 


i=—0o i=—0co 


g(t+k|t—-l)+Hee(t), k=0,1,-: 


In terms of the components, this can be written as 
g(t+k|t)=G(t+k|t—-1)+ helt), k=0,1,--- (C.8) 


where i = 1,--- , p, and hix = [hin(1) --- hiz(p)] € R'*? is the ith row of Hg. 
Also, from (C.7), the state vector at tf + 1 is expressed as 


fi(t+1|t) 
nit+ny | t) 
a(t+1)= :; € R” 
Gp(t + 1 | t) 
Yp(t + Np | t) 


Thus from (C.8), we can verify that 
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wi(t+1|t—1) hii(1) +++ hai(p) 
etapa iy): yet av kin) 
a(t+1)= ; + : : e(t) (C.9) 
Ube |t 1) hpi (1) +++ hpi (p) 
fip(t-+ mp [t—1) | Ltrpn, (1) «++ Ryn, () 


Note that the first term in the right-hand side of (C.9) belongs to the space a /— In 
particular, we see that g;(t + n; | t — 1),i = 1,--- ,p are expressed in terms of a 
linear combination of the components of the basis vector x(t). Thus, we have 


Pp 5 

g(ttni |t-N=S°>S akg (tt+k-1|t-1), i=1,---,p (C10) 
j=l k=1 

Other components §;(t + 1 |t—1),1 =1,--- , n; — 1 are already contained in the 


vector x(t) as its elements, so that they are expressed in terms of shift operations. 
Moreover, putting & = 0 in (C.8) and noting that g(¢ | t) = y(t) and Ho = I, yield 


y(t) = gilt |t—1) + e,(), cs Ps (C.11) 


where g;(t | t — 1) belongs to sor aae 

For simplicity, we consider a 3-dimensional process y with 9-dimensional state 
vector, and assume that ny = 3,n2 = 4,n3 = 2 withn; +n2 +n3 = 9. Then, by 
using (C.8) and (C.11), we have the following A- and C-matrix: 


0 0 
0 0 
ayy a3 
0 0 
A= 0 0 
0 0 
054 053 
0 1 


Tee 9S SB MRT? LAP ige™ AM Nha aT oe a 
31 31 31 }A39 A39 AZo A39/A33 33 


and 


We can easily infer the forms of A- an 
have the following state space equation 


C Overlapping Parametrization 347 


a(t +1) = Ax(t) + Ke(t) (C.12a) 
y(t) = Ca(t) + e(t) (C.12b) 


where kK € R”*? is the coefficient matrix for e(t) of (C.9). We see that the number 
of unknown parameters in this Markov model is 2np, since K has no particular 
structure. 

From the property of block Hankel matrix, we have the following lemma. 


Lemma C.1. [54, 109] Any n-dimensional stochastic LTI state space system can 
be expressed by means of a state space model (C.12) with a particular multi-index 
n. In other words, the state space model (C.12) with a particular multi-index N can 
describe almost all n-dimensional stochastic LTI systems. 


More precisely, let M/>(p) be the model structure of (C.12) with a multi-index 77. 
Also, let the sum of Mz(p) over possible multi-indices be 


M(p) = U Im Mp(p) 


Then, the set M(p) denotes the set of all n-dimensional linear stochastic system with 
p outputs. Of course, Mz(p) may overlap, but J (p) contains all the n-dimensional 
linear systems M>(p). 

The state space model of (C.12) is called an overlapping parametrization with 
2np independent parameters. Thus, we can employ the PEM to identify the 2np 
unknown parameters, but we need some complicated algorithms for switching from 
a particular n+ to another n? during the parameter identification, since we do not 
know the multi-index n prior to identification. 

In general, a p-dimensional process y with state dimension n is called generic if 
the state vector x is formed as in (C.7) using some multi-index i = (m1,--+ , np). 
The next example shows that there exist non-generic processes. 


Example C.1. [54] Let p = 2 and n = 3, and consider the following matrices: 


a BO 00 
c= [oto A=]001]|, K=]10 
010 Oa 


where a3 # 0. Then, since H; = CA—'K, j = 1,---, we get 


_ {00 _ {B80 _ {ab B _ fa’?B+B a8 
m=|95] t= |54]. i= |4 ae =| 0 Par 


Thus the first 3 x 2 block submatrix of H is given by 


00 Bg 0 

10 0 1 

[ey slave age B 

aS ea se eel es 
LHe | of B 26 +B of 

1 0 0 1 
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It is easy to see that the first two rows of the block Hankel matrix are linearly inde- 
pendent, but the 3rd row is linearly dependent on the first two rows. Thus we observe 
that the selection 7 = (2,1) (ny = 2,n2 = 1) does not yield a basis. Actually, in 
this case, we should pick the first two rows and the fourth row to form a basis. 


D 


List of Programs 


In Appendix D, some of MATLAB® programs used in this book are included. 


D.1 Deterministic Realization Algorithm 


Table D.1 displays a program for the Ho-Kalman’s algorithm of Lemma 6.1, where 
it is assumed that k, 1 > n := rank(H). 


Table D.1. Ho-Kalman’s algorithm 


% Function zeiger.m 

% Lemma 6.1 

function[A,B,C] = zeiger(H,p,m,n) 

% p = dim(y); m = dim(u); n = dim(x) 

% (p, m) are known 

% kp x Im Hankel matrix 

% k, | > n; H must be finite rank 

kp = size(H,1); Im = size(H’,1); 

[U,S,V] = svd(H); % Eq. (6.14) 
n=rank(S); % if n is known, this is redundant. 

$1 = sqrtm(S(1:n,1:n)); 

% T = identity matrix % Eq. (6.15) 
Ok = U(:,1:n)*S1; 

Cl = S1*V(:,1:n)’; 

A = Ok(1:kp-p,:)\Ok(p+1:kp,:); % Eq. (6.16) 
B =Cl(:,1:m); 

C = OkK(1:p,:); 
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Table D.2. MOESP method 


% Function moeps.m 
% Lemma 6.6 
% m=dim(u), p =dim(y), n =dim(x); k = number of block rows 
% U =km x N input data matrix 
% Y =kp x N output data matrix 
function [A,B,C,D] = moesp(U,Y,m,p,n,k) 
km = size(U,1); kp = size(Y,1); 
L = triu(qr([U;Y]’))’; % LQ decomposition 
L114 =L(1:km,1:km); 
L21 =L(km+1:km+kp,1:km); 
L22 = L(km+1:km+kp,km+1:km+kp); 
[UU,SS,VV] = svd(L22); % Eq. (6.39) 
U1 = UU(:,1:n); % nis known 
Ok = U1*sqrtm(SS(1:n,1:n)); 
% Matrices A and C 
C = OK(1:p,1:n); % Eq. (6.41) 
A = pinv(Ok(1:p*(k-1),1:n))*Ok(p+1:p*k, 1:n); % Eq. (6.42) 
% Matrices B and D 
U2 = UU(:,n+1:size(UU’,1)); 
Z = U2"*L21/L11; 
XX = []; RR = []; 
for j = 1:k 
XX = [XX; Z(:,m*(j-1)+1:m*j)]; 
j = OK(1:p*(k-j),:); 
Rj = [zeros(p*(j-1),p) zeros(p*(j-1),n); 
eye(p) zeros(p,n); zeros(p*(k-j),p) Okj]; 
RR = [RR; U2’*Rj]; 
end 
DB = pinv(RR)*XX; % Eq. (6.44) 
D = DB(1:p,:); 
B = DB(p+1:size(DB, 1),:); 


D.2 MOESP Algorithm 


Table D.2 displays a program for the basic MOESP method developed in [172, 173]. 
A formation of data matrices is omitted in this program, but Table D.3 contains a 
related method of constructing data matrices. 

It should be noted that way of computing matrices A and C’ is different in each 
method, but the computing method of B and D in the MOESP method in Table 
D.2 is commonly used in many other subspace identification methods (not always). 
Thus we can say that differences in algorithms of subspace system identification 
methods are attributed to the way of computing A and C, or the image of extended 
observability matrix Im(O,). 
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D.3 Stochastic Realization Algorithms 


We show two algorithms of stochastic realization based on Lemma 7.9 in Section 7.7 
and Algorithm A in Section 8.7. It will be instructive to understand the difference 
between the two stochastic realization algorithms. 


Table D.3. Stochastic realization algorithm 


% Function stochastic.m 
% Lemma 7.9 
% function [A,C,Cb,K,R] = stochastic(y,n,k) 
% y = [y(1),y(2),....y(Ndat)]; p x Ndat matrix 
% n= dim(x); k =number of block rows 
function [A,C,Cb,K,R] = stochastic(y,n,k) 
[p, Ndat] = size(y); N = Ndat-2*k; 
ii = 0; 
for i = 1:p:2*k*p-p+1 

ii = ii+1; 

Y (iti+p-1,:) = y(:,litii+N-1); 
end; 
% Data matrix 
Ypp = Y(1:k"p,:); 
fori = 1:k 

j = (k-i)"p+1; 


Yp(j:j+p-1,:) = Ypp((i-1)*p+1:i*p,:); % Yp := Y check 


= Y(k*p+1:2*k*p,:); 
Rip = (Yf*Yp’)/N; % Covariance matrix 
[U,S,V] = svd(Rfp); % Eq. (7.81) 
S2 = sqrtm(S(1:n,1:n)); 
Ok = U(:,1:n)*S2; % Eq. (7.82) 
Ck = S2*V(:,1:n)’; 
A = OK(1:k*p-p,:)\, Ok(p+1:k*p,:); % Eq. (7.83) 
C = Ok(1:p,:); 
Cb = Ck(1:n,1:p)’; 
RR = (Yf*Yf’/N; 
RO = RR(1:p,1:p); % Variance of output 
[P,.L,G,Rept] = dare(A’,C’,zeros(n,n),-RO,-Cb’); % ARE (7.84) 
= G’ 


R = R-C*P*C; 


Table D.3 displays the stochastic realization algorithm of Lemma 7.9, in which 
ARE is solved by using the function dare. This function dare can solve the ARE 
appearing in stochastic realization as well as the one appearing in Kalman filtering. 
For details, see the manual of the function dare. 
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Table D.4. Balanced stochastic realization — Algorithm A 


% Function stocha_bal.m 
% Algorithm A in Section 8.7 
% y = [y(1),y(2),....y(Ndat)]; p «x Ndat matrix 
% n = dim(x); k = number of block rows 
function [A,C,Cb,K,R] = stocha_bal (y,n,k) 
[p,Ndat] = size(y); N = Ndat-2*k; 
ii = 0; 
for i = 1:p:2*k*p-p+1 
i= ii4+1; 9 Y(iti+p-1,:) = y(:,iitii+N-1); 
end 
Yp = Y(1:k*p,:); Yf = Y(k*p+1:2*k*p,:); 
% LQ decomposition 
H = [Yp; Yf]; [Q,L] = qr(H’,0); L = L/sqrt(N); % Eq. (8.76) 
L11 =L(1:k*p,1:k*p); L21 = L(k*p+1:2*k*p,1:k*p); 
L22 = L(k*p+1:2*k*p,k*p+1 :2*k*p); 
% Covariance matrices 
Rff = (L21*L21’+L22*L22’); 
Rfp = L21*L11’;, Rep =L11*L11’; 
% Square roots & inverses 
[Uf,Sf, Vf] = svd(Rff); [Up,Sp,Vp] = svd(Rpp); 
Sf = sqrtm(Sf); Sp = sqrtm(Sp); 
L = Uf*Sf*Vf’; M = Up*Sp*Vp’; % Eq. (8.77) 
Sfi = inv(Sf); Spi = inv(Sp); 
Linv = Vf*Sfi*Uf; Minv = Vp*Spi*Up’; 
OC = Linv*Rfp*Minv’; 
[UU,SS,VV] = svd(OC); % Eq. (8.78) 
Lambda = Rpp(1:p,1:p); % Covariance matrix of output 
S = SS(1:n,1:n); 
Ok = L*UU(:,1:n)*sqrtm(S); % Eq. (8.79) 
Ck = sqrtm(S)*VV(:,1:n)’*M’; 
A = Ok(1:k*p-p,:)\Ok(p+1:k*p,:); % Eq. (8.80) 
C = Ok(1:p,:); Cb = Ck(:,(k-1)*p+1:k*p)’; 
R = Lambda-C*S*C’; K = (Cb’-A*S*C’)/R; % Eq. (8.81) 


Table D.4 shows a program for Algorithm A of Section 8.7. The form of data 
matrix Y, in Table D.4 is slightly different from Y, in Table D.3, since in Table 
D.3, after generating Y,, we formed Vs by re-ordering the elements. Thus a way of 
computing C™ in Table D.4 is different from that in Table D.3. There is no theoretical 
difference, but numerical results may be slightly different. 

The program of Table D.4 is very simple since the solution of ARE is not em- 
ployed, but there are possibilities that A — BK is unstable. Also, it should be noted 
that we compute L~! and M—! by using pseudo-inverses. For, if the function chol is 
used for computing the matrix square roots, the program stops unless SX’, and X,, 
are positive definite, but these matrices may be rank deficient. 
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The programs for the ORT and CCA methods derived in Sections 9.7 and 10.6 are 
displayed in Tables D.5 and D.6, respectively. Also, a program of the PO-MOESP 
is included in Table D.7. Comparing the programs in Tables D.5 and D.7, we can 
easily understand the difference in algorithms of the ORT and PO-MOESP; both 
use the same LQ decomposition, but the way of utilizing L factors is different. For 
identifying B and D, the ORT uses the same method as the PO-MOESP. 


Table D.5. Subspace identification of deterministic subsystem — ORT 


% Function ort-pk.m 

% Subsection 9.7.1 

function [A,B,C,D] = ort_pk(U, Y,m,p,n,k); 

% ORT method by Picci and Katayama 

km = size(U,1)/2; kp = size(Y,1)/2; 

% LQ decomposition % Eq. (9.48) 
L = triu(qr([U;Y]’))’; 

L141 =L(1:km,1:km); 

L41 = L(2*km+kp+1:2*km+2*kp, 1:km); 

L42 = L(2*km+kp+1:2*km+2*kp,km+1:2*km); 

% SVD % Eq. (9.52) 
[UU,SS,VV] = svd(L42); 

U1 = UU(:,1:n); 

Ok = U1*sqrtm(SS(1:n,1:n)); 

C = OkK(1:p,1:n); 

A = pinv(Ok(1:p*(k-1),1:n))*Ok(p+1 :k*p, 1:n); % Eq. (9.53) 
% Matrices B and D 

U2 = UU(:,n+1:size(UU’,1)); 

Z = U2"*L41/L11; % Eq. (9.54) 
% The program for computing B and D is the same 

% as that of MOESP of Table D.2. 

XX = []; 

RR = []; 

for j = 1:k 

XX = [XX; Z(:,m*(j 
Okj = Ok(1:p*(k-j),:); 
Rj = [zeros(p*(j-1),p),zeros(p*(j-1),n); 
eye(p), Zeros(p,n); 
zeros(p*(k-j),p), Okj]; 

RR = [RR;U2”*Rj]; 


)+1:m*j)]; 


1 
:) 


DB = pinv(RR)*XX; 
D = DB(1:p,:); 
B = DB(p+1:size(DB, 1),:); 
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Table D.6. Stochastic subspace identification -CCA 


% Function cca.m 
% Section 10.6 CCA Algorithm B 
% y = [y(1),y(2),.-.y(Ndat)]; pxNdat matrix 
% u = [u(1),u(2),...,u(Ndat)]; mxNdat matrix 
% n= dim(x); k =number of block rows 
% Written by H. Kawauchi; modified by T. Katayama 
function [A,B,C,D,K] = cca(y,u,n,k) 
[p,Ndat] = size(y); [m,Ndat] = size(u); N = Ndat-2*k; 
ii = 0; 
for i = 1:m:2*k*m-m+1 
ii=ii+1; U(ititm-1,:) = u(:,ii:ii#+N-1); % Data matrix 
end 
ii = 0; 
for i = 1:p:2*k*p-p+1 
ii = ii+1; 
Y (iti+p-1,:) = y(:,ii:ii+N-1); % Data matrix 
end 
Uf = U(k*m+1:2*k*m,:); Yf = Y(k*p+1:2*k"p,:); 
Up = U(1:k"m,:); Yp = ¥(1:k*p,:); Wp = [Up; Yp]; 
H = [Uf; Up; Yp; Yf]; 
[Q,L] = qr(H’,0); L=L; % LQ decomposition 
L22 = L(k*m+1:k*(2*m+p),k*m+1:k*(2*m+p)); 
L32 = L(k*(2*m+p)+1:2*k*(m+p),k*m+1 :k*(2*m+p)); 
L33 = L(k*(2*m+p)+1:2*k*(m+p),k*(2*m+p)+1:2*k*(m+p)); 
Rff = L32*L32’+L33*L33’; Rpp = L22*L22’; Rfp = L32*L22’; 
[Uf,Sf, Vf] = svd(Rff); [Up,Sp,Vp] = svd(Rpp); 
Sf = sqrtm(Sf); Sfi = inv(Sf); Sp = sqrtm(Sp); Spi = inv(Sp); 
Lfi = Vf*Sfi*Uf’; Lpi= Vp*Spi*Up’; % Lf = Uf*SfVFf’; Lp = Up*Sp*Vp’ 
OC = Lfi*Rfp*Lpi’; 
[UU,SS,VV] = svd(OC); % Normalized SVD 
$1 = SS(1:n,1:n); U1 = UU(:,1:n); V1 = VV(:,1:n); 
X = sqrtm(S1)*V1"*Lpi*Wp; XX = X(:,2:N); X = X(:,1:N-1); 
U = Uf(1:m,1:N-1); Y = Yf(1:p,1:N-1); 
ABCD = [XX;Y]/[X;U]; % System matrices 
A = ABCD(1:n,1:n); B = ABCD(1:n,n+1:n+m); 
C = ABCD(n+1:n+p,1:n); D = ABCD(n+1:n+p,n+1:n+m); 
W = XX-A*X-B*U; EE = Y-C*X-D*U; 
SigWE = [W;E]*[W;E]’/(N-1); 
QQ = SigWE(1:n,1:n); RR = SigWE(n+1:n+p,n+1:n+p); 
SS = SigWE(1:n,n+1:n+p); 
[P,.L,G,Rept] = dare(A’,C’,QQ,RR,SS); % Kalman filter ARE 
K=G’; % Kalman gain 


The CCA method — Algorithm B — in Table D.6 is based on the use of estimates 
of state vectors. It may be noted that the LQ decomposition in the above table is 
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different from the one defined by (10.46); in fact, in the above program, the past 


Up 
YY; 


Pp 
matrices are the same. 


input-output data | | is employed for Wolk—1> since the row spaces of both data 


The following table shows a program of the PO-MOESP algorithm [171]. 


Table D.7. PO-MOESP algorithm 


% Function po_moesp.m 

function [A,B,C,D] = po_moesp(U, Y,m,p,n,k); 

% cf. Remark 9.3 

% m=dim(u), p=dim(y), n=dim(x) 

% k=number of block rows; U=2km x N matrix; Y=2kp x N matrix 
km=k*m; 

kp=k"p; 

% LQ decomposition 

L = triu(qr([U;YT))’; 

L411 =L(1:km,1:km); 

L21 =L(km+1:2*km,1:km); 

L22 = L(km+1:2*km,km+1:2*km); 

L31 = L(2*km+1:2*km+kp, 1:km); 

L32 = L(2*km+1:2*km+kp,km+1:2*km); 

L41 = L(2*km+kp+1:2*km+2*kp, 1:km); 

L42 = L(2*km+kp+1:2*km+2*kp,km+1:2*km); 

L43 = L(2*km+kp+1:2*km+2*kp,2*km+1:2*km+kp); 
[UU,SS,VV]=svd([L42 L43)}); 

U1 = UU(:,1:n); 

Ok = U1*sqrtm(SS(1:n,1:n)); 

C = OK(1:p,1:n); 

A = pinv(Ok(1:p*(k-1),1:n))*Ok(p+1 :k*p, 1:n); 

% Matrices B and D 

U2 = UU(:,n+1:size(UU’,1)); 

Z = U2"*[L31 L382 L41)/[L21 L22 L11]; 

% The rest is the same as that of MOESP of Table D.2. 
% The subsequent part is omitted. 


E 


Solutions to Problems 


Chapter 2 


2.1 (a) Suppose that rank(A) = r. Let A= USV", where Y = diag(Z,, 0), 
and 5, € R"*” > 0. Also, partition U = [U, U,] and V = [V, V,]. From Lemma 
2.9 (i), we see that Im(A) = Im(U,), Ker(A?) = Im(U,.), and Im(A‘) = Im(V,), 


Ker(A) = Im(V,.). Item (a) is proved by using 


Im(U,) @ Im(U,.) = R™, Im(U,.) L Im(U,) 


Im(V,) 6 Im(V,) = R”, Im(V,) L Im(V,.) 


(b) These are the restatement of the relations in (a). 
(c) We can prove the first relation of (c) as 


Im(AA?‘) = Im(U,.52U.") = Im(U,) = Im(A) 
Also, the second relation is proved as follows: 


Alm(B) = {Az | « = Bn, n € R°?} = {ABn | 7 € R’} = Im(AB) 


2.2. Compute the product of three matrices in the right-hand side. 
2.3 (a) It suffices to compute the determinant of both sides of the relations in 
Problem 2.2. (b) This is obvious from (a). (c) Pre-multiplying the right-hand side 


of the formula by E BI yields the identity. (d) Comparing the (1, 1)-blocks of 


the formula in (c) gives 


[A—-BD'C)'!=A'+A'BID-CA'B]'CA! 


By changing the sign of D, we get the desired result. 

2.4 Let Px = Ax, x # 0. Then, P(Pxr) = P(Ax) = \?x holds. Hence, from 
P? = P, we have Px = 2?z. It thus follows that Ax = 7x for 4 0, implying 
that X = 0 or A = 1. Suppose that Ay = +-- =», = Land A,41 = +++ =A, = 0. 
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We see that trace(P) = )77_, A; = r. Let the SVD of P be given by P= UXV1, 
where ©’ = diag(o,,--- ,o,). Since, in this case, 0; = A;, we see that rank(P) = 
rank(5’) = r. 

2.5 Suppose that P? = P holds. Then, from Lemma 2.4 and Corollary 2.1, we 
have (2.17) and (2.18). Thus (a) implies (b). 

We show (b) > (c). As in the proof of Lemma 2.5, we define V = Im(P) and 
W = Im(/,, — P). Note that for the dimensions of subspaces, we have 


dim(V V W) = dim(V) + dim(W) — dim(VN W) 


Since « = Px + (I, — P)z, it follows that R’? = VV W and n = dim(V V W). 
Also, from (b), we get dim(V) + dim(W) = n, and hence dim(V NM W) = 0. This 
implies that VM W = {0}, so that (c) holds. 

Finally, we show (c) — (a). Post-multiplying [, = P + (I, — P) by P yields 
P = P? + (I, — P)P, so that we have P(I,, — P) = (I, — P)P. Thus 


ImP(I, —P)CIm(P), — Im(J, — P)P C Im(J, — P) 


hold. If (c) holds, we get Im(P) N Im(/,,—P) = {0}, implying that Im] P(7, —P)] = 
{0} follows. Hence, we have P? = P. This completes the proof. 


2.6 Since LT = I, we get P? =TLTL=TL =P. Also, T and L are of full 
rank, so that Im(P) = Im(TL) = Im(T) and Ker(P) = Ker(T'L) = Ker(L). This 
implies that P is the oblique projection on Im(T)) along Ker(Z). Similarly, we can 
prove that @ is a projection. 


2.7 Define L = [L, L2] andV = [Vi V9]. Since BE ev) = E 0 i 


0 In—r 
I, [2||I, -X | _ |f, 0 
Vi Vo 0 In-r| | 0 Iner 
This implies that £, = [,, [yg = X, Vi = 0, Vo = In_,, and hence 


rl 


we have 


0 0 


2.8 (a) Let P = V,V,!. Then, P? = P and P? = P hold, so that P is 
an orthogonal projection. Also, from Lemma 2.9, we have Im(AT) = Im(V,) = 
Im(V,V,"). Similarly, we can prove (b), (c), (d). 


2.9 Let A = UNV, where U € R™*™, V € R"™” are orthogonal and 
ye ar aI € R™*” with ©, € R"*" diagonal, where r := rank(A). Then, we 


0 0 
get 
AATy =(orstoly suis yy 
where . 
IT“ 0 
Bi ee s mxm 
(GE )t = | 0 | ER 
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Thus 


T =) my 
ATAAT) =v [Sg ie seal li 5 [UT = 


That (AT A)* AT = A? is proved similarly. 
2.10 Let A = ULV", where © = diag(oi,--- ,on), andU € R™*",V € 
R”"*”. Then, we have Q = UV? and IT = VZV". Note that VV? =V'IV =Iy. 


Chapter 3 
3.1 Since |g, | = 1/k, k = 1, 2, ---, we have 


This implies that the system is not stable. 


3.2. To apply the Routh-Hurwitz test for a continuous-time LTI system to a 
discrete-time LTI system, let z = (s + 1)/(s — 1). Then, we see that jz) <1 © 
Res] < 0. From f((s + 1)/(s — 1)) =0, we get 


(1+ a, + az)s? + 2(1 — ag)s + (1 — a, +a) = 0 
Thus the stability condition for z? + a,z + 22 is given by 
1lt+aj,+a.>0, l—aj+a.>0, l—ay, >0 (E.1) 
3.3 From a diagonal system of Figure 3.3, A and B are given by 
[ >: 0 0 ] ka 
A=; 0A 0,, B= | be 
Lo 023] Los | 
Thus, from Theorem 3.4 (ii), it suffices to find the condition such that 
[ > 2%. 0 0 bi ] 
rank 0 r2 coe 0 be _ 3, Z= Mi, 2, A3 
l 0 0 A3 emi, bs | 
holds. Hence, the reachability condition becomes 
bi bobs 4 0, (Ay — Ag) (Az — A3)(A3 — Ar) # 0 
3.4 Note that C = [b Ab --- A"~1b]. From (2.3), we have 
A” = —(ay A" +++) + Qn-1A + ap!) 


Hence, 
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AC =[Ab Ab --. A) 


=[Ab A?b ++) — (a1 A" 45+) +an-1A + anl)b] 
0 —An 
1 —An-1 =) 
=[b Ab --- A”—"9] . ; =CA 
1 —-A1 
and 
1 
0 7 
b=[b Ab --- Av-"5] _| =@b 
0 


3.5 We can show that 


rank[A+BkK—-—AI B)=n & rank[A—AI Bl=n 


C C 


The results follow from Theorems 3.4 ~ 3.9. 


3.6 Define A := A/(p(A) +). Then, the spectral radius of A is strictly less 
than 1, and hence AF -+ 0 ask —> oo. Thus, in particular, the elements of the 
sequence {A*, k = 1,2,---} are bounded, so that we have |(A%) 3 <C,C>0 
fork = 1,2,--- andi,j = 1,--- ,n. Since (A*);; = (A*);;/(p(A) + €)*, we get 
the desired result. 


A+ LC — XI A-—XXI 
rank =n © rank =n 


3.7 Before proving this assertion, it will be helpful to look at the proof of a basic 
convergence result for the Césaro sum in Problem 4.3 (a). 
The solution x(t) is given by 


4 
x(t) = A*x(0) + S- At—*-1 f(g) 
k=0 
By assumption, p(A) < 1. Thus we can take € > 0 such that p(A) +e =: a < 1. 


From Problem 3.6, 


\(A*)j] << C(o(A)+e)* > |A*lla<Cia®, k=1,2,--- 


where C; > 0, and || - ||, is a matrix norm (see Section 2.3). By using the above 


estimate, 
t-1 


llx()|| < Cra’||x(0)|| + Cr Soa FAI 


k=0 
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Since the first term tends to zero as t — ov, it suffices to show that the second 
term tends to zero as t > oo. Let the second term be g(t). Then, we get 


a(t) = Cra’ Sr a-*a(k), (R= IF 


k=0 


By hypothesis, limz_,.. 8(k) = 0, so that for any €, > 0, there exists No > 0 such 
that 6(k) < e,(1 — a)/(aC;) for all k > No. Thus, for a sufficiently large ¢, 


No t—1 
g(t) = Cra’ [Seobac + y- 4300 


k=0 k=No+1 


t-1 2 —k e(1 — a) — —k 
< Cia Soa B(k) + ah Ds a 


k=0 k=No+1 


No SS 
= Cia’! p» a—* B(k) + See 


The first term in the right-hand side of the above inequality tends to zero as t + oo, 
while the second term is smaller than ¢;. This completes the proof. 
3.8 It can be shown that 
eee = Tn, 0)/|zl-A B 


“CD = eee Pil 0 eal 


: L 0 
= -1 n = 
where G(z) = C(zI — A)~'B. Since rank | C(I — A)? a n+p, we get 


rank, S(z) = rank,(zI — A) + rank,G(z) =n + rank,G(z) 


3.9 ( [51], vol. 2, pp. 206-207) Suppose that R(z) = b(z)/a(z) is rational, and 
that the series expansion 


blz) _ fa fe 
a (E.2) 


converges for |z| > p for some p > 0. Suppose that polynomials a(z) and b(z) are 
given by 


a(z) =2™ +a.2™ !+---+am, b(z)=b:2" 1 + bee 7 + + bm 


Multiplying (E.2) by a(z) yields 


bz) + baz te tom = (2 +02 | + tam) (2+3+-) 
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Equating the coefficients of equal powers of z on both sides, we obtain 


by = hy 
be = ho + ayhy 


bn =hm + a1hm—1 +++ + Gm—1hi4 
and forj =m+1,--:, 
O=h; + ayhj-1 +++: + amhj—m 


This implies that (2.40) holds with r = m, so that the Hankel matrix (2.35) has finite 
rank. 

Conversely, if H has finite rank, then (2.40) holds from Lemma 2.14. Hence, 
by using a), --- , a, of (2.40) and the above relations, we can define b;, --- , Dy. 
Thus we see that b(z)/a(z) is a desired rational function, which equals R(z) = 
hi /z+ho/2+---. 

3.10 Note that the following power series expansion: 

1 a ee 
log(l+2-°)=z — 52 + 32 sey |z| >1 
Thus the right-hand side converges to a non-rational transfer function, implying that 
the impulse response cannot be realized by a state space model. 


Chapter 4 

4.1 Putting i — 7 = k, we change variables from (i, 7) to (j, &). Then, k is 
bounded by -N+1<k < N—1, andj is boundedbyl1 <j <N-kifk >0 
and by -k +1<j < Nifk < 0. Thus we get 


N N N-1N-k -1 N 
Yoyo e@-DN=S5O D5 oH) + DO YS Oo) 
i=1 j=1 k=0 j=1 k=—N+41 j=—k4+1 
N-1 -1 
= (N-k)g(k)+ So (N +4) O(k) 
k=-0 k=—N+1 
N-1 
= Yo (W-|k)d() 


k=—-N+1 


4.2 Define & = ¢t — s. Then, applying the formula in Problem 4.1, we have 


N N 2N 
S> SS Alt-s)= S5 QN+1-|k)) ACA) 
i=—N j=—N k=—2N 


Thus dividing the above equation by 2 + 1 gives (4.13). 
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4.3 (a) Let e > 0 be a small number. From the assumption, there exists an 
integer p > 0 such that |a,| < ¢ fork > p. Let M = max{|ai|, --- , |ap|}. Then, 


a1 +++ tan 
n 


M+ e(n — M 
ee D) Ey eM ice 
n 


n 


Taking the limit n — oo, we have pM/n — 0. Since € > 0 is arbitrary, the assertion 
is proved. 


(b) Define B, = (1/n) > ax with Bo = 0. Then, limp co |Bn| = 0 by 
k=1 
hypothesis. Noting that 


kay = k? By — (k —1)?Bu_a — (F — 1) Be-1 


we have 


Thus 


n 


1 
[In| < on 


k=1 


k-1 te 

— ||B,_-1| < -— Bre 

—*) [Bal $= > |Be-al +0 
k=1 

since limg_+o0 |Bx| = 0. 


(c) Define C,, = S- az. By assumption, lim C,, = 0. It can be shown that 
n—-oo 


k=n 
Ya-d (1-5) a < S- ar| + — S- kar 
k=1 k=1 k=n4+1 k=1 
= Cnstl-+ = [So kon 
k=l 


Since the first term in the right-hand side of the above equation converges to zero, it 
remains to prove the convergence of the second term. By the definition of C,,, 


n 


Tes see 1 
y kap = ra Se k (Cy — Crsi) = 5 S- ((kCr —(k+ Cea] + Cr41) 
k=1 k=1 


k=1 


Cy n+1 


1 n 
Ch4i + A oe Crit 
k=1 


n 


We see that the first and second terms of the right-hand side of the above equation 
converge to zero, and the third term also converges to zero by (a). 
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4.4 For zero mean Gaussian random variables a, b, c, d, we have (see e.g. [145]) 
Efabcd} = E{ab} E{cd} + E{ac} E{bd} + E{ad} F {bc} (E.3) 
By using (E.3), it follows from (4.17) that 
Age(k) = E{a(t +1 + k)a(t + k)a(t + Da(t)} — yz 
= Age(l)Aza(l) + Aee(k) Ava (k) + Azc(l + k)Aca(l — k) — uz 
= AZ, (k) + Ave (t+ k)Asa(l — k) 
By the Schwartz inequality, 


N 
S> Ace(k) 
k=-0 


N 
<)- A2,(k) + 
k=0 


N 
Da seak (k + I) Age(k —1) 


k= 
N N 1/2 
< 5° A2,(k) + d Aral (k +1) Dern i-0) (E.4) 
k=0 


=0 
Since (4.20) holds, it follows that 


N 


1 
lim — A? (k+l) =0 1=0,1,--- 
Wad eal ) b) b) 2 
and hence from (E.4) 
l N 
Nilo N41 & ess 


This implies that (4.19) holds from Problem 4.3 (c). 


4.5 Similarly to the calculation in Problem 4.2, we have 


N N 
T = —ju(l-k) 
n(w) NTT as pee ANG 
N 
a 3 (1- Ir| oe (E.5) 
IN+1 
T=—-2N 


Note that 
li e Jet = 
T=—-2N 
exists. It therefore from Problem 4.3 (c) that the limit of the right-hand side of (E.5) 
converges to ®(w). 


4.6 A proof is similar to that of Lemma 4.4. Post-multiplying 
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y(t) = So geu(t — k) 
k=0 
by u(t — 1) and taking the expectation yield 


Ayu(l) = So gn E{u(t — k)u(t —D} = So geAuu(l — k) 
k=0 k=0 


Post-multiplying the above equation by e~“! and summing up with respect to / yield 


Po) =e. Ye Aaah ©) 


I=—00 k=0 
=F 95:6 *F (te) = G(M) raw) 
k=0 
4.7 Since &,,(w) = 2 —2cosw = 4sin?(w/2), 
/ log By, (w)dw = 7) log [4 sin? (w/2) |dw 
=a 0 


= 4r log 2 +4 f log sin(w/2)dw 
0 


a /2 
= 4mlog2 +8 [ log sin 8 d# = 0 > —co 
0 


where ie log sin 6d6 = —(/2)log2 (Euler) is used. 
4.8 The form of (w) implies that y is a one-dimensional ARMA process, so 
that 
y(t) + ay(t — 1) = e(t) + ce(t — 1) 
Thus from (4.35), the spectral density function of y becomes 


9 1+c? + 2ccosw 


1+ ce” i = 
~~ 1L+a? + 2acosw 


ca = g? |———_ 
Co ed 1+ ae 


Comparing the coefficients, we have a = —0.9, c = 0.5. 
4.9 Since H(z) = (z + c)/(z +a), we have 
z™(z +c) 
z+a 
Computing [2 H(z)|4 yields 
[2"H(2)]4 = (0) + (a) 2 $e tela)" + (-a)™2 +] 
= (-a)™"1(e—a)(1 + (—a)z7' + (-a)?z7" +--+) 


(=a)™"* (c= a)z 


z™ H(z) = = (2z™+e2™')(1 + (-a)z7' + (-a)?277 + ++) 
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Thus from (4.57), the optimal predictor is given by 


Q(z) = COM Me= az za _ a)" e~a)z 
z+a z+e z+e 
4.10 From (4.58), 
Pelee oda eee ae een 


This is a state space equation, implying that the joint process (x, y) is Markov. 
4.11 A proof is by direct substitution. 
4.12 By definition, 


=] oe) 
Byy(z) = SY) G(AT)*1CT 27 + Ayy (0) + SCAT 


l=—oo f=1 


Since CT = AITC™ + S, we compute the terms that include S. Thus, 


=A co 
igsS ye SA) Clee CA zt 


l=—oo l=1 


=o (Soar. CT+C (>: tet) Ss 


I=1 I=1 
SS =A GY ECG Ars 
= STWT (271) + W(z)S 
Adding Ig to the right-hand side of (4.80) yields (4.81). 
Chapter 5 


5.1 This is a special case of Lemma 5.1. 


5.2 Let K,(t) and P,(t) respectively be the Kalman gain and the error covari- 
ance matrices corresponding to aQ(t), aS(t), aR(t), aP(0). We use the algorithm 
of Theorem 5.1. For ¢ = 0, it follows from (5.41a) that 


K,.(0) = [A(0)aP(0)C™ (0) + B(0)aS(0)][C(0)aP(0)C™ (0) + aR(0)]7! 
= K(0) 


Also, from (5.42a), 


P,,(1) = A(0)aP(0)A™ (0) — K.(0)[C(0)aP(0)C™ (0) + aR(0)] Ki (0) 
+ B(0)aQ(0)B" (0) = aP(1) 


Similarly, fort = 1, we have K,(1) = K(1), Pa(2) = aP(2), and hence induc- 
tively K,(t) = K(t), t = 2, 3,---. 
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5.3 It follows that 
x(t) — po(t) = H(t | t- 1) + (@(E|t 1) — pol) 
where &(¢ | f— 1) L &(t | t — 1) — p(¢). Thus we have 
I(t) = P(t |t—-1)+ X(t) 
Since Y(t) > 0 and P(t |t—1) > P(t | t) > 0, we get 
> 


Mt) >P¢|t-lYePel|)20, Mt) > X) 


5.4 Follow the hint. 
5.5. The derivation is straightforward. 
5.6 Substituting A = 6 + SR7'!C into (5.68), we have 
K =[(6+ SR-'C)PC™ + S\(CPC"™ + R)“! 
= @PC'(CPC'+R)'+SR'=T+SR 


Thus we gett A -KC=@-TC. 
It follows from (5.67) that 


P= APA’ — K(CPC' + R)K™+Q 
= APAT—(F+SR"')(CPCT + R)\(P+SR")T+Q 
= (64+ SR'!C)P(é4+ SR'c)" 
~(F+SR)(CPCT +R) +SR)'+Q 
From the definition of I’, 
P=6Pé" —r(CPC'+R)I'+Q+4+SR'CPF' 
+ @PC'R'S™+SR-'1CPC'R "Ss" 
—I(CPC" + R)R-'S™ —SR-(CPC™ + R)r™ 
— §R-\(CPC' + R)R's™ 
= 6P6" —T(CPC' + RB)! +(Q—SR™'S") 
This proves (5.70) since M = Q-—SR7'S". 
5.7 Equation (5.90) is given by 
SSAA RIC? =ASC VAG) =CSC")-(e=Cx A") (E.6) 
Using A = F + CT A71(0)C, the first term in the right-hand side of (E.6) is 
hea f+ C A OOS +c aoe)! 
= FUFT+CTA10)CLFT + FSCTA'(0)C 
+ OTA*(Q)CZCTA1(0)C 
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Also, we have 

Cha AxNCT ==FyG? +OTA OA) CxO") 
so that the second term in the right-hand side of (E.6) becomes 


LaS(-Fro' eC A OAM) = CFC AO) exc") 
x (-FZECT + CTA! (0)[A) —CZC*))T 
=FXCT (AO)= CLO") CSF = FEC AOC 
—~CTA1(0)CZFT + 67A-1(0)(A(0) — CZC™)A1(0)E 


Computing I, + Iz, we get (5.91). 


5.8 Since 
FT 0 in C7 AMO 
N= AT 4-1(,\ ’ L= 
—-C’A*(0)C I, 0 F 
we have 
iT 
So i,.=C A Oye 0 In] | In -CTA1(0)C 
LIL = 
Ba Pe caller 
= ie 0 | =NJN 
Consider the following two eigenvalue problems: 
(A) Na=XLe (B) Lia=pNTz 


Let \ 4 0 be an eigenvalue of Problem (A). Since 
det(L™ — pN*) = det(L — pN) = 0 


we see that pp, = 1 /X is an eigenvalue of Problem (B). Also, pre-multiplying L? x = 
uN a by NJ yields 


NJL 2 = pNINT x = pLJIL x => Nz=yplz, z= JL x 
Thus 4 = 1/, is also an eigenvalue of Problem (A). 


Chapter 6 
6.1 (a) Since g, =k, k = 1, ---, go = 0, we have 


Ag, = ; rank H44 = 2 


mB whe 
oOo ® Ww dH 
Da oe 
NEO Ot B® 
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By using the MATLAB® program in Table D.1, we get 


= fen Saal we a ec C =[-1.3039 — 0.8368) 


0.2182 0.7818 0.8368 


Thus the transfer function is given by G(z) = z/(z — 1). 


(b) In this case, the Hankel matrix becomes 


1 0-1 0 0 1 

0-1 0 0 1 O 

-1 0 0 1 0-1 

H¢¢6 = OO 4 BEF ON? rank H¢¢ =4 
0 1 0-1 0 1 
1 0-1 0 1 0 
so that we have 

0.1450 0.8808 —0.3327 —0.3239 —1.0016 
on —0.8808 0.3551 0.3533 —0.0115 B= 0.1151 
a 0.3327 —0.3533 —0.6187 —0.5087 | ’ ~ | —0.2418 
0.3239 —0.0115 0.5087 —0.8814 0.2200 


C =[-1.0016 —0.1151 — 0.2418 — 0.2200] 


Thus the transfer function is given by G(z) = (22 + 27)/(z244+ 2% +2? +241). 


6.2 Let P be the reachability Gramian. Substituting A = SA'S, B = SC" into 
(3.34) yields 


P= APA'™+ BB" =SA'SPSAS+ SC'™CS 


Since SS = I, we get SPS = A'(SPS)A+C'C. Thus the observability Gramian 
is expressed as Q = SPS. Though (A, B,C) are not balanced, both Gramians have 
the same eigenvalues. Note that ¥’, (with JT’ = /) is diagonal, i.e., 


k-1 
Z, = CC = >_ A‘BBT(AT) (43 )_ A'BBT(AT) = r) 


i=0 i=0 


6.3 Since the orthogonal projection is expressed as E{A | B} = KB, K € 
IR?*4, the optimality condition is reduced to A — KB L B. Hence we have 


(A—KB)B'=0 => K =(AB")(BB")I 


showing that E{A | B} = (AB™)(BB™)tB. 
6.4 Since Q?Q2 = 0, two terms in the right-hand side of A = La,Qt + 
Lo2Qt are orthogonal. From B = Ly, OF with B full row rank, we see that [41 is 


nonsingular and Q} forms a basis of the space spanned by the row vectors of B. It 
therefore follows that E{A | B} =LuQT = Delo B. Also, from AQ, = L1, 
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we get E{A | B} =AQi QT). Since L22Q# is orthogonal to the row space of B, 
it follows that B{A | B+} = Lo QF. 


6.5 Let D= i 


| . Then, D has full row rank. Thus from Problem 6.3, 


E{A| D} = A[BT CT] [gar aii 2] 


CB.CCY C 


We see that the above equation is expressed as 


BB" BC™])"'[B 
CBT ccT 0 


E{A|D} = A[B™ C7] | 


BB® BCT|™ [0 
T AT 
ae | Ee oct | C 
Since span{B}NM span{C} = {0}, the first term of the right-hand side of the above 
equation is the oblique projection of the row vectors of A onto the space spanned by 
the row vectors of B along the row vectors of C’. Thus we have 


BBT al Ki 


E\c{A | B} = A[BT c™7 Ee cot 0 


6.6 Note that Ro. = & : and R32 = [D4 L43]. Let 4 € Ker(Ro2). 


L32 L33 . 

Then, L227 = 0 and L3on + L33€ = 0 hold. However, since [22 is nonsingular, 
we have 7 = 0, so that D33€ = 0. Thus it suffices to show that £33€ = 0 implies 
L43€ = 0. Consider the following vectors 


L43 0 0 
L93 pz 0 — 0 
[33| > | L33€} | OO 
La3 Dg Laz 


Lemmas 6.4 and 6.5 show that the above vector is also an input-output pair. However, 
since the past input-output and future inputs are zero, the future outputs must be zero, 
implying that 243 = 0. This completes the proof. 


Chapter 7 
7.1 Let Z(z) = B(z)/A(z). Let z = e%”. Then, we have 
Ble”) _ ew) + jd(w) 


Mew) = A(e) ~ ale) + jb) 


It thus follows that 


ReZ(ei) =F 


E Solutions to Problems 
Hence Z(z) is positive real, if A(z) is stable and 
a(w)c(w) + b(w)d(w) > 0, —tI<w<t 
From the given first-order transfer function, we have 


Z(e!) c+bcosw + jbsinw 
e —— 
a+cosw + jsinw 


Thus from (E.7), the positivity is satisfied if z + a is stable and if 
ac + b+ (ab +c) cosw > 0, —1I<w<t 
It therefore follows that |ab + c| < ac + b and ac + 6 > 0. Hence, we have 
la] <1, lc| <b, b>0 
7.2 It can be shown that 


Re[A(e!)] =1+ a; cosw + az cos 2w 


= 2a2 cos? w + a, COSW — ag +1 
For az = 0, we see that the positive real condition is reduced to 
a, cosw+1> 0, —T<W<Tt 


This is satisfied if and only if -—1 <a, <1. 
In the following, we assume that a2 0, and define 


f(a) := Qn? + (a) /az)a+1fag—1, -1l<a2<1 
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(E.7) 


(E.8) 


1. Suppose that a2 < 0. Then, from (E.8), the positive real condition becomes 


(E.9) 


Since f(0) = 1/az — 1 < 0, (E.9) is satisfied if and only if f(—1) < 0 and 


f(1) < 0. Thus we have 
a2 +a, +1>0, a2 —-a,+1>0, an <0 
2. Suppose that az > 0. In this case, the positive real condition becomes 
f(z)>0,  -l<e<l 


Let 1 = —a,/4a2. According to the location of 71, we have three cases: 
a) Ifa; < —1, then f(—1) > 0. This implies that 


0 <a, < a,/4, az >a, —1 
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b) If —1 < a <1, then f(21) > 0, so that 


2 2 
ay ay ay 1 
> >-— —+4 —--~} <l E.10 
a2 4 ’ ag 2 4 ’ 9 oT (a 5) aS ( ) 
c) Ifa, > 1, then f(1) > 0. Hence, we have 

0 < ay < —a,/4, a2 > —a, —1, 


Thus, the region D = {(ai,a2) | Re[A(e%”)] > 0} is a convex set enclosed by 
the two lines ag = +a; — 1 and a portion of the ellipsoid in (E.10) [see Figure E.1]. 


2 1 2 
ean Ge Sey SH, B44 (a-5) <1 (E.11) 
ag 
2 
1 
1 
0 ai 
—1 1 
a) = 
Tal 0 i 2 


Figure E.1. Region of positive realness in (a1, a2)-plane 


7.3 It is easy to see that Z(z) is positive real if and only if 
f(z) :=1- af — a3 + 2ayaoz > 0, -l<a<l 
Thus the condition is given by 
lay — ag| <1, lay +a9| <1 (E.12) 


Remark E.1. It will be instructive to compare the positive real conditions (E.11) and 
(E.12) above and the stability condition (E.1). 


7.4 Using the Frobenius norm, we have 


AB]? _, AB][AB]" 
CDM ALC DIN CD 
= trace(AA? + BBT +CC™ + DD") 


= ||Allz + [IBllz + IIClz + DI 
< (|Alle + Bll + [Cllr + ||D\le)? 
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Taking the square root of the above relation, we get the desired result. 


A a H . By the definition of 2-norm, 


For the 2-norm, we define _X = | C 


|X ||3 = F(X)? = max (ATA+ CTC) 
< max A(AT A) + max (OTC) = ||AI]3 + 11CI3 < (Alle + IICll2)? 


Thus we get || X ||2 < ||AlJo + ||Cl|2. Similarly, we get ||Y|J2 < || Blj2 + || D]|2. Thus 
combining these results, 
|X Y]||2 =max\(XX7+YY7) 
< max A(X X7) + max (YY") 
=[|X1b + lI¥ lla < (UX lle + II¥ lle)? 


Hence we have 
LX Ylle < |Xlle + IV Ile < IlAlle + [|Blle + [lCll2 + || Dlle 


7.5 M(IT) is easily derived. Let T = 3. Since M(IT) = oe : 


see that IJ = 3 satisfies the LMI. Now suppose that JT < 3. It then follows from 
(7.35) that 


> 0, we 


gi - 3 (1- 57) 20 => [>1/3 


Hence we have /7, = 1/3 and [7* = 3, implying that the solutions of LMI satisfy 
1/3 < IT < 3. Note that in this case F := A — CT A71(0)C = 0; see (5.91). 
7.6 By the definition of C, and T_(k), 
Qe = Cry To'(k + 1)Chy, 


A(0) Ce, 
eTCT T_(k) 


= (CF Ac; 


C 
erat] 
Note that this equation has the same form as (7.59). It is easy to see that (2), satisfies 
(7.62) by the following correspondence in (7.60). 
On Oy Ae A) see 
7.7 First from (7.64), we note that 
K(A(0) — Coc’) + Alct =c™ (E.13) 
Substituting A = Ax + KC into (7.63) yields 
TT = (Ag + KC)IT(Ax + KC)! + K(A(O) —CCT)K™ 
= AxTAk+KCOC'K' + KCMAL + AxC' KT 
+ K(A(0) — CCT) KT 
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Again by using Ax = A — KC, it follows that 
T= Ax TAL + KCC’ K' + KCH(A— KC)! +(A-—KC)HC'KT 
+ K(A(0) — CTICT) KT 
= AxIAL +KCHWA™ + AWC'KT —-KCC'K" 
+ K(A(0) — CIIC')K* + (KA(0)K* — KA(0)K*) 


Using (E.13) in the right-hand side of the above equation, we readily obtain the 
desired result, i.e., (7.65). 


7.8 In view of Subsection 7.4.2, the constraint is given by € = O' u, so that the 


Lagrangian becomes 
L=ulutr(€-OTu) 


Differentiating £ with respect to w yields 2u — OA = 0. Thus, from the constraint, 
we have 
€-OTO\2=0 S A=207O) TE 


so that u = 0(01O)—1€ holds. Hence we have min(u"w) SLT (OO). 


Chapter 8 
8.1 It is easy to show that 
L™ 0 See Sey} |L 0} _ L™ SL LY SM 
0 Mt Lye Syy | | OM] MY So M?TS,,M 
Thus from (8.9) and (8.10), the result follows. Also, the computation of the determi- 
nant is immediate. 


8.2 Though this can be proved by using the orthogonality condition a— Kb L b, 
we give a different proof. See also Problem 6.3. 
Since I := ||a — Kb||5, = traceE{(a — Kb)(a — Kb)"}, 


I = trace (F{aa"} — Efab™}K™ — KE{ba™} + KE{bb"}K7) 


We see that the right-hand side is a quadratic form in K = (k;;). 
Recall the formulas for the differentiation of trace (e.g. see [185]): 


0 Ge 0 Bs 
gx trace(AX) =A’, gx race(Ax JHA 
A ctrave( AX BX?) = A™XB'+AXB 
Thus it follows that 


es = -2B{ab"} + 2K E{bb™} =0 = K = E{ab™}(E{bbT})“! 
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8.3 Applying Lemma 4.11 to (8.48), we get 


Ay (lt) =CA'ZCT+CA KR 
= CA’ (ASCT + KR) = CA C7 


for! = 1, 2,---. For! = 0, we have A,,(0) = CXCT + R, so that we have the 
desired result. We can show that Theorem 8.5 and Lemma 8.5 give the same result. 

8.4 Since y is scalar, we note that T; = T_,L = M, H? = F hold. Thus 
H := L“!HL-? =USV'" is symmetric, so that H =USVtT =VSZUuUT™. Since 
Im(H) = Im(U) = Im(V) holds, there exists a nonsingular matrix S € R"*” such 
that UV Since fe = UO = SIVIV S => SI Sswesce thats. = VEU. 
is orthogonal. From USSVT = VEU", we have VST = SS, so that similarly to 
the proof of Lemma 5.2, we can show that S = diag(-+1, --- , +) holds. By using 
0=LUS'?, U =VS, © = SSS, we see that 


Ca VM Sess ba oe Pr 60 
holds, where we used the fact that S and 3’ 1/2 are diagonal. Thus, 
CH= sol. eos 
Hence, from (8.50), we get 
A=erel = s(otytot)ts = s(ototyts = sats 


Also, from (8.52), = 
CP Se mavp)=sC* 
implying that C = C'S holds. 

8.5 Since H = 1,U, VM} = LeU2=V.' M3, and since Im(L1Ui) = 
Im(L2U2), there exists a nonsingular S € R”*” such that L2U. = [1U,S. 
Note that Ly Ee = LoLt holds. Thus Z = Lbs becomes an orthogonal ma- 
trix with ZZ’ = Z™Z = I. This implies that ZU2 becomes an orthogonal 
matrix, and hence S§ = U;'(ZU2) becomes an orthogonal matrix. Again, using 
[,U, 5V;' Mf = L2U25V, Me, and noting that M,M? = M2Me, it follows 
that 

U,0V) = Ly hate ZV," Ms Met 
so that 
Mi Ub=H-Le bites Us bso, SU ss su 
and hence 
yo? = §y7st (E.14) 


It should be noted that 7? is a diagonal matrix with different elements and that 
is orthogonal. Thus similarly to Lemma 5.2, we have S = (+1, --- , +1). In fact, 
suppose that 
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g- a" ‘| Se CRO-D*O-D) og DER", CER 
Then, from S'S = I, we have ||a||? + c? = 1 and from (E.14) 


poe aie gees ie Sere ye 


n—1 


a= o2a, ae ob 


Since o? is not an eigenvalue of Y?_,, we see that a = 0, b = 0, so that c? = 1. 


By using LU = L1U,S8 and U,V; = Le hete eve Me Me. 
Y= SUVS MS M;TV, = SSVMSM>IVy, => Vi My'MeV2S = In 


where we used the fact that det ©’ 4 0. Since the right-inverse of vi My lis MV, 
we get M2.V2S = M,Vj, so that M2V2 = M,V,S. Also, from (8.41), 


Os = Lal? = USS? = 018 
@ = Py l ue = = 2sv iui = Sey 


It thus follows from (8.50) that 
A, = ef el = setels=SA\S 
Moreover, from (8.51) and (8.52), 


Cy = Oo(1: p,1:n)=O101:p1lin)S=C1S 
Co = C2(1: p,1:n)? =@€)(1:p,1:n)'S=C1S 


From (8.53) and (8.54), we have 


Ry ]A0) = G20} =A0) = O)S esc) = A0) =O 20) =] hi 
Kes (0, SAO )R, SSC = SAS ESC) ja, 
=5(CP =A SOR Sek, 


This completes the proof. 


Glossary 


Notation 


R, C, Z 
R”, C” 

R™ xn 
cmxn 
dim(x) 
dim(V) 
VVW 
V+W 
span{v, w, x} 
AT, At 
A}, A-T 
At 

A>0O 
A>0O 
Al/2 


a(A), o(A) 
Im(A) 
Ker(A) 
I[zIl2. ||2Ilc0 
I|All2, [Alle 


fp 


real numbers, complex numbers, integers 
n-dimensional real vectors, complex vectors 
(m x n)-dimensional real matrices 

(m x n)-dimensional complex matrices 
dimension of vector x 

dimension of subspace V 

vector sum of subspaces V and W 

direct sum of subspaces V and W 

subspace generated by vectors v, w, x 
transpose of A € R™”*”, conjugate transpose of A € C”*” 
inverse and transpose of the inverse of A 
pseudo-inverse of A 

symmetric, nonnegative definite 

symmetric, positive definite 

square root of A 

determinant of A 

trace of A 

rank of A 

eigenvalue, ith eigenvalue of A 

spectral radius, i.e., max; |A;(A)| 

singular value, ith singular value of A 
minimum singular value, maximum singular value of A 
image (or range) of A 

kernel (or null space) of A 

2-norm, oo-norm of x 

2-norm, Frobenius norm of A 


transfer matrix G(z) = D+ C(zI — A)“'B 


378 Glossary 


E{«} mathematical expectation of random vector x 

cov{az, y} (cross-) covariance matrix of random vectors x and y 

N(p, 3’) Gaussian (normal) distribution with mean jy and covariance matrix 1 

E{x | y} conditional expectation of x given y 

(x, y)3¢ inner product of x and y in Hilbert space H 

\|x|| 3c norm of zx in Hilbert space H 

Span{---} closed Hilbert subspace generated by infinite elements {-- - } 

Ef{x | ¥} orthogonal projection of x onto subspace Y 

Eyz{e | Y} oblique projection of x onto Y along 2 

a:=b a is defined by b 

a=:b b is defined by a 

3 z-transform operator 

z complex variable, shift operator z f(t) := f(¢+ 1) 

Re real part 

Ric(-) Riccati operator; (7.34) 

Abbreviations 

AIC Akaike Information Criterion; see Section 1.1 

AR AutoRegressive; (4.33) 

ARMA AutoRegressive Moving Average; (4.34) 

ARMAX AutoRegressive Moving Average with eXogenous input; (1.4) 

ARX AutoRegressive with eXogenous input; (A.7) 

ARE Algebraic Riccati Equation; (5.67) 

ARI Algebraic Riccati Inequality; (7.35) 

BIBO Bounded-Input, Bounded-Output; see Section 3.2 

CCA Canonical Correlation Analysis; see Section 8.1 

CVA Canonical Variate Analysis; see Section 10.8 

FIR Finite Impulse Response; (A.12) 

IV Instrumental Variable; see Section A.1 

LMI Linear Matrix Inequality; see (7.26) 

LTI Linear Time-Invariant; see Section 3.2 

MA Moving Average; (4.44) 

MIMO Multi-Input, Multi-Output; see Section 1.3 

ML Maximum Likelihood; see Section 1.1 

MOESP Multivariable Output Error State sPace; see Section 6.5 

N4SID Numerical algorithms for Subspace State Space System 
IDentification; see Section 6.6 

ORT ORThogonal decomposition based; see Section 9.7 

PE Persistently Exciting; see Sections 6.3 and Appendix B 

PEM Prediction Error Method; see Sections 1.2 and 1.3 

PO-MOESP Past Output MOESP; see Section 6.6 

SISO Single-Input, Single-Output; see Section 3.2 


SVD Singular Value Decomposition; see (2.26) 
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